0%| | 0/1115 [00:00> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:38:51,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:38:53,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:38:53,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:38:55,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:38:55,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:38:57,030 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:38:57,693 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:38:58,947 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:38:59,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:00,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:01,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:02,687 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:03,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:04,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:05,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:06,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:07,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:08,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:09,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:10,439 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:11,086 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:12,340 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:12,966 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:15,120 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:15,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:17,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:17,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:18,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:19,516 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:20,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:21,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%| | 1/1115 [00:32<9:57:12, 32.17s/it] 0%| | 1/1115 [00:32<9:57:12, 32.17s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:39:22,658 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:23,265 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:24,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:25,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:26,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:27,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:28,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:28,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:30,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:30,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:31,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:32,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:33,801 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:34,442 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:35,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:36,298 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:37,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:38,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:39,298 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:39,935 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:41,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:41,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:42,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:43,605 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:44,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:45,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:46,611 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:47,248 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:48,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:49,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:50,270 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 8.857, 'learning_rate': 0.0, 'epoch': 0.01} [WARNING|modeling_utils.py:388] 2022-03-25 19:39:50,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▏ | 2/1115 [01:01<9:27:39, 30.60s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:39:52,184 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:52,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:54,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:54,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:55,823 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:56,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:57,628 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:39:58,260 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:39:59,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:00,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:01,274 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:01,906 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:03,094 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:03,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:04,896 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:05,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:06,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:07,335 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:08,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:09,134 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:10,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:10,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:12,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:12,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:13,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:14,560 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:15,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:16,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:17,524 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:18,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:19,342 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 8.7704, 'learning_rate': 6e-07, 'epoch': 0.01} [WARNING|modeling_utils.py:388] 2022-03-25 19:40:19,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▏ | 3/1115 [01:30<9:15:04, 29.95s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:40:21,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:21,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:23,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:23,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:24,791 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:25,426 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:26,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:27,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:28,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:29,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:30,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:30,814 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:31,982 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:32,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:33,737 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:34,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:35,513 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:36,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:37,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:37,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:39,054 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:39,674 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:40,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:41,458 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:42,620 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:43,219 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:44,403 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:45,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:46,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:46,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:47,957 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 8.6839, 'learning_rate': 1.2e-06, 'epoch': 0.02} [WARNING|modeling_utils.py:388] 2022-03-25 19:40:48,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▎ | 4/1115 [01:59<9:04:22, 29.40s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:40:49,823 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:50,425 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:51,576 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:52,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:53,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:53,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:55,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:55,696 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:56,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:57,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:40:58,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:40:59,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:00,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:41:01,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:02,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:41:02,783 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:03,918 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:41:04,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:05,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:41:06,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:07,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:41:08,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:09,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:41:09,725 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:10,867 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:41:11,483 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:12,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:41:13,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:14,386 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:41:15,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:16,187 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:41:16,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▎ | 5/1115 [02:27<8:56:05, 28.98s/it] 0%|▎ | 5/1115 [02:27<8:56:05, 28.98s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:41:18,648 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed 0%|▎ | 5/1115 [02:27<8:56:05, 28.98s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:21,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:25,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:25,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:28,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:28,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:32,052 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:35,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:35,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:39,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:39,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:42,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:42,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▍ | 6/1115 [02:55<8:50:08, 28.68s/it] Setting `use_cache=False`...1] 2022-03-25 19:41:18,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▍ | 6/1115 [02:55<8:50:08, 28.68s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:49,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:49,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:49,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:54,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:57,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:41:57,547 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:01,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:01,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:04,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:08,004 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:08,004 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:11,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:11,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▌ | 7/1115 [03:24<8:50:41, 28.74s/it] Setting `use_cache=False`...1] 2022-03-25 19:41:46,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▌ | 7/1115 [03:24<8:50:41, 28.74s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:18,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:18,556 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:22,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:22,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:25,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:28,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:28,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:32,394 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:32,394 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:35,854 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:39,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:39,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:39,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:15,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▌ | 8/1115 [03:52<8:44:55, 28.45s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▌ | 8/1115 [03:52<8:44:55, 28.45s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:46,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:46,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:49,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:53,206 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:53,206 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:56,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:42:56,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:00,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:03,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:03,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:07,004 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▋ | 9/1115 [04:20<8:40:21, 28.23s/it] Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▋ | 9/1115 [04:20<8:40:21, 28.23s/it] Setting `use_cache=False`...1] 2022-03-25 19:42:42,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 1%|▋ | 9/1115 [04:20<8:40:21, 28.23s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:14,062 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:14,062 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 7.4337, 'learning_rate': 4.8e-06, 'epoch': 0.04} [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 7.22, 'learning_rate': 5.399999999999999e-06, 'epoch': 0.05} [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.9989, 'learning_rate': 5.999999999999999e-06, 'epoch': 0.05} [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.8282, 'learning_rate': 6.599999999999999e-06, 'epoch': 0.06} [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.6831, 'learning_rate': 7.2e-06, 'epoch': 0.06} [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.4666, 'learning_rate': 7.799999999999998e-06, 'epoch': 0.07} [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.2614, 'learning_rate': 8.4e-06, 'epoch': 0.07} [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:43:17,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.0435, 'learning_rate': 9.6e-06, 'epoch': 0.08} 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.8492, 'learning_rate': 1.02e-05, 'epoch': 0.09} 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.8448, 'learning_rate': 1.0799999999999998e-05, 'epoch': 0.09} 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.6986, 'learning_rate': 1.14e-05, 'epoch': 0.09} 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.673, 'learning_rate': 1.1999999999999999e-05, 'epoch': 0.1} 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.4552, 'learning_rate': 1.26e-05, 'epoch': 0.1} 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.4442, 'learning_rate': 1.3199999999999997e-05, 'epoch': 0.11} 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.2917, 'learning_rate': 1.3799999999999998e-05, 'epoch': 0.11} 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▏ | 17/1115 [07:56<8:09:54, 26.77s/it] Setting `use_cache=False`...1] 2022-03-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:50:29,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:50:29,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:50:29,531 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.29, 'learning_rate': 1.44e-05, 'epoch': 0.12} 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▊ | 26/1115 [11:46<7:37:52, 25.23s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.1356, 'learning_rate': 1.4999999999999999e-05, 'epoch': 0.12} 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.1709, 'learning_rate': 1.5599999999999996e-05, 'epoch': 0.13} 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9962, 'learning_rate': 1.6199999999999997e-05, 'epoch': 0.13} 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 2%|█▉ | 27/1115 [12:10<7:32:30, 24.95s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▏ | 30/1115 [13:22<7:16:59, 24.17s/it]g-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:37,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:37,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:37,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:37,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:37,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:37,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:37,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:37,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:53,665 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:53,665 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9779, 'learning_rate': 1.7999999999999997e-05, 'epoch': 0.14} [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9887, 'learning_rate': 1.8599999999999998e-05, 'epoch': 0.15} [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:52:57,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:53:42,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:53:42,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9495, 'learning_rate': 1.92e-05, 'epoch': 0.15} [WARNING|modeling_utils.py:388] 2022-03-25 19:53:47,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:53:47,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:53:47,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:53:53,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:53:53,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:53:53,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:53:59,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:53:59,200 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:03,224 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:03,224 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8838, 'learning_rate': 1.98e-05, 'epoch': 0.16} [WARNING|modeling_utils.py:388] 2022-03-25 19:54:07,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:07,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:07,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:07,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:15,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:15,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:15,292 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:21,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:21,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:21,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:21,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:54:28,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:54:28,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:54:28,136 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:33,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:33,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:33,775 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:40,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:40,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:40,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:40,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:40,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9915, 'learning_rate': 2.1e-05, 'epoch': 0.17} [WARNING|modeling_utils.py:388] 2022-03-25 19:54:50,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:50,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:50,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:50,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:50,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:54:50,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:55:02,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:55:02,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:55:02,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:55:02,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:55:02,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8265, 'learning_rate': 2.1599999999999996e-05, 'epoch': 0.17} [WARNING|modeling_utils.py:388] 2022-03-25 19:55:12,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:55:12,581 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:55:17,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:55:19,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:55:19,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:55:19,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:55:25,366 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:43:10,575 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▊ | 39/1115 [16:37<6:14:41, 20.89s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▊ | 39/1115 [16:37<6:14:41, 20.89s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9207, 'learning_rate': 2.2199999999999998e-05, 'epoch': 0.17} 3%|██▊ | 39/1115 [16:37<6:14:41, 20.89s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 3%|██▊ | 39/1115 [16:37<6:14:41, 20.89s/it][WARNING|modeling_bart.py:1051] 2022-03-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:55:35,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:55:37,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:55:37,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:55:41,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:55:41,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:55:41,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:55:45,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:55:45,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:55:49,787 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:55:51,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:55:54,128 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:55:56,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:55:58,513 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:56:00,611 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:56:02,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 19:56:02,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8838, 'learning_rate': 2.34e-05, 'epoch': 0.18} [WARNING|modeling_utils.py:388] 2022-03-25 19:56:06,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:08,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:10,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:12,374 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:14,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:16,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:18,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:18,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:20,336 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:22,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:24,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:26,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:27,872 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:29,719 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:31,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:31,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:33,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:35,280 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:38,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:40,470 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:42,126 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:43,752 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:46,162 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:47,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:47,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:49,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:52,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:54,230 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:57,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:56:58,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:00,203 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:00,203 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:03,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:04,566 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:07,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:08,660 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:09,945 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:09,945 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:12,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:15,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:16,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:18,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:20,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:20,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:23,228 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:25,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:27,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:29,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:29,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:31,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:33,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:34,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:36,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:36,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:39,376 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:40,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:43,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:43,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6302, 'learning_rate': 2.88e-05, 'epoch': 0.22} [WARNING|modeling_utils.py:388] 2022-03-25 19:57:46,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:46,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:50,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:50,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:54,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:54,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:57:57,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:01,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:01,502 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:05,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:05,184 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:08,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:08,893 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:12,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:12,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.4289, 'learning_rate': 2.94e-05, 'epoch': 0.23} [WARNING|modeling_utils.py:388] 2022-03-25 19:58:16,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:19,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:19,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:23,566 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:23,566 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:27,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:27,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:30,752 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:34,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:34,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:37,897 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:37,897 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:41,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:41,477 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.2769, 'learning_rate': 2.9999999999999997e-05, 'epoch': 0.23} [WARNING|modeling_utils.py:388] 2022-03-25 19:58:45,170 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:48,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:48,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:52,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:52,354 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:55,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:55,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:58:59,516 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:03,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:03,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:06,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:06,670 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:10,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:10,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:10,242 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:13,911 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:17,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:17,402 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:20,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:20,898 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.4623, 'learning_rate': 3.119999999999999e-05, 'epoch': 0.24} [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 19:59:24,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.866, 'learning_rate': 3.2399999999999995e-05, 'epoch': 0.25} 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8544, 'learning_rate': 3.2999999999999996e-05, 'epoch': 0.26} 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8141, 'learning_rate': 3.36e-05, 'epoch': 0.26} 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|███▉ | 55/1115 [21:19<7:27:08, 25.31s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8047, 'learning_rate': 3.42e-05, 'epoch': 0.26} 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8444, 'learning_rate': 3.48e-05, 'epoch': 0.27} 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8777, 'learning_rate': 3.539999999999999e-05, 'epoch': 0.27} 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7374, 'learning_rate': 3.5999999999999994e-05, 'epoch': 0.28} 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8447, 'learning_rate': 3.6599999999999995e-05, 'epoch': 0.28} 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7271, 'learning_rate': 3.7199999999999996e-05, 'epoch': 0.29} 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.691, 'learning_rate': 3.78e-05, 'epoch': 0.29} 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7024, 'learning_rate': 3.84e-05, 'epoch': 0.3} 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6623, 'learning_rate': 3.9e-05, 'epoch': 0.3} 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 5%|████▏ | 59/1115 [23:10<7:57:38, 27.14s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6917, 'learning_rate': 3.96e-05, 'epoch': 0.3} 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7466, 'learning_rate': 4.02e-05, 'epoch': 0.31} 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6977, 'learning_rate': 4.08e-05, 'epoch': 0.31} 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6366, 'learning_rate': 4.14e-05, 'epoch': 0.32} 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|████▊ | 68/1115 [27:08<7:35:02, 26.08s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6913, 'learning_rate': 4.2e-05, 'epoch': 0.32} 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 6%|█████ | 72/1115 [28:50<7:23:22, 25.51s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▏ | 73/1115 [29:15<7:19:23, 25.30s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▏ | 73/1115 [29:15<7:19:23, 25.30s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5726, 'learning_rate': 4.259999999999999e-05, 'epoch': 0.33} 7%|█████▏ | 73/1115 [29:15<7:19:23, 25.30s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▏ | 73/1115 [29:15<7:19:23, 25.30s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▏ | 73/1115 [29:15<7:19:23, 25.30s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6529, 'learning_rate': 4.319999999999999e-05, 'epoch': 0.33} [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7602, 'learning_rate': 4.3799999999999994e-05, 'epoch': 0.34} [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6657, 'learning_rate': 4.4399999999999995e-05, 'epoch': 0.34} [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5708, 'learning_rate': 4.4999999999999996e-05, 'epoch': 0.35} [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:08:15,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:09:54,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:09:54,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:09:54,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:09:54,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:09:54,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:09:54,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:09:54,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5891, 'learning_rate': 4.62e-05, 'epoch': 0.35} 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 7%|█████▌ | 78/1115 [31:16<6:58:40, 24.22s/it]g-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.695, 'learning_rate': 4.68e-05, 'epoch': 0.36} [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5544, 'learning_rate': 4.7399999999999993e-05, 'epoch': 0.36} [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:10:47,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6184, 'learning_rate': 4.7999999999999994e-05, 'epoch': 0.37} [WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:11:38,713 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:11:59,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:11:59,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:11:59,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6035, 'learning_rate': 4.8599999999999995e-05, 'epoch': 0.37} [WARNING|modeling_utils.py:388] 2022-03-25 20:11:59,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:07,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:07,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6037, 'learning_rate': 4.9199999999999997e-05, 'epoch': 0.38} [WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:11,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:34,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:34,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:34,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:34,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:34,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:34,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6785, 'learning_rate': 4.98e-05, 'epoch': 0.38} [WARNING|modeling_utils.py:388] 2022-03-25 20:12:34,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:48,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:48,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:48,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:48,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:48,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:12:48,500 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:13:00,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:13:00,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:13:00,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4987, 'learning_rate': 5.04e-05, 'epoch': 0.39} [WARNING|modeling_bart.py:1051] 2022-03-25 20:13:00,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:13:00,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:13:11,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:13:11,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:13:11,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:13:11,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:13:19,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:13:19,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:13:19,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:13:19,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:13:25,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:13:25,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:13:25,347 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:13:31,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:13:31,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:13:31,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:13:31,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:13:31,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:13:41,520 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:13:43,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:13:43,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5589, 'learning_rate': 5.1599999999999994e-05, 'epoch': 0.39} [WARNING|modeling_bart.py:1051] 2022-03-25 20:13:47,642 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:13:49,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:13:49,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:13:53,962 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:13:56,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:13:56,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:13:56,317 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:14:02,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:14:02,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:14:02,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6228, 'learning_rate': 5.2199999999999995e-05, 'epoch': 0.4} [WARNING|modeling_utils.py:388] 2022-03-25 20:14:07,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:14:10,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:14:10,172 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:14:14,230 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:14:16,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:14:16,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:14:20,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:14:22,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:14:22,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:14:24,615 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:14:26,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:14:26,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:14:30,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:14:32,761 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:14:34,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:14:36,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:14:36,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 19:55:27,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▍ | 91/1115 [35:49<5:21:59, 18.87s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:14:39,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:14:41,087 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:39,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:14:43,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:39,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:14:45,043 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:39,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:14:47,014 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:39,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:14:48,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:39,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:14:50,879 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:39,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:14:52,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:39,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:14:52,802 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:39,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▌ | 92/1115 [36:04<5:05:50, 17.94s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:14:54,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:14:56,676 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:54,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:14:58,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:54,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:00,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:54,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:02,291 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:54,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:05,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:54,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:07,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:54,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:07,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:14:54,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▌ | 93/1115 [36:19<4:49:34, 17.00s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:15:09,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:11,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:09,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:13,000 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:09,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:14,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:09,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:18,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:09,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:18,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:09,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:22,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:09,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:22,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:09,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 8%|██████▋ | 94/1115 [36:34<4:36:05, 16.23s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:15:23,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:25,523 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:23,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:27,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:23,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:30,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:23,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:31,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:23,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:33,200 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:23,955 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▋ | 95/1115 [36:46<4:16:12, 15.07s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:15:36,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▋ | 95/1115 [36:46<4:16:12, 15.07s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:15:36,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:37,671 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:36,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:40,437 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:36,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:41,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:36,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:44,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:36,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 96/1115 [36:57<3:54:36, 13.81s/it] Setting `use_cache=False`...1] 2022-03-25 20:15:36,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 96/1115 [36:57<3:54:36, 13.81s/it] Setting `use_cache=False`...1] 2022-03-25 20:15:36,239 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:48,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:47,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:50,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:47,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:51,920 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:47,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:54,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:47,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 97/1115 [37:06<3:32:50, 12.54s/it] Setting `use_cache=False`...1] 2022-03-25 20:15:47,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▊ | 97/1115 [37:06<3:32:50, 12.54s/it] Setting `use_cache=False`...1] 2022-03-25 20:15:47,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:57,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:56,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:15:59,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:56,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:01,798 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:15:56,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▉ | 98/1115 [37:15<3:10:48, 11.26s/it] Setting `use_cache=False`...1] 2022-03-25 20:15:56,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▉ | 98/1115 [37:15<3:10:48, 11.26s/it] Setting `use_cache=False`...1] 2022-03-25 20:15:56,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:06,629 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:04,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:08,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:04,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:10,220 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:04,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 99/1115 [37:22<2:50:40, 10.08s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:16:12,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 99/1115 [37:22<2:50:40, 10.08s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:16:12,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:14,564 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:12,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:16,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:12,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▉ | 100/1115 [37:29<2:35:08, 9.17s/it] Setting `use_cache=False`...1] 2022-03-25 20:16:12,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▉ | 100/1115 [37:29<2:35:08, 9.17s/it] Setting `use_cache=False`...1] 2022-03-25 20:16:12,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▉ | 100/1115 [37:29<2:35:08, 9.17s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|██████▉ | 100/1115 [37:29<2:35:08, 9.17s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:23,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:23,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:27,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:31,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:31,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:34,983 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:34,983 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:38,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:38,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:42,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:45,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:45,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:20,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 101/1115 [37:59<4:18:26, 15.29s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████ | 101/1115 [37:59<4:18:26, 15.29s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.9227, 'learning_rate': 5.94e-05, 'epoch': 0.45} [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:53,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:53,231 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:56,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:16:56,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:00,331 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:03,917 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:07,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:07,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:11,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:11,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:14,649 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:16:49,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▏ | 102/1115 [38:27<5:25:57, 19.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▏ | 102/1115 [38:27<5:25:57, 19.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.8449, 'learning_rate': 5.9999999999999995e-05, 'epoch': 0.46} [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:21,842 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:21,842 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:25,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:25,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:28,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:32,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:32,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:36,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:36,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:39,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:39,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:43,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:43,028 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:18,307 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▏ | 103/1115 [38:56<6:11:14, 22.01s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▏ | 103/1115 [38:56<6:11:14, 22.01s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:50,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:50,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:53,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:57,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:17:57,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:18:00,517 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:18:00,517 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:18:03,995 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:18:07,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:18:07,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:18:07,426 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.1179, 'learning_rate': 6.12e-05, 'epoch': 0.47} 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9785, 'learning_rate': 6.18e-05, 'epoch': 0.47} 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 9%|███████▎ | 104/1115 [39:24<6:40:19, 23.76s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8924, 'learning_rate': 6.239999999999999e-05, 'epoch': 0.48} 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7039, 'learning_rate': 6.299999999999999e-05, 'epoch': 0.48} 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▍ | 106/1115 [40:19<7:12:54, 25.74s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7125, 'learning_rate': 6.359999999999999e-05, 'epoch': 0.48} 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6977, 'learning_rate': 6.419999999999999e-05, 'epoch': 0.49} 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6824, 'learning_rate': 6.479999999999999e-05, 'epoch': 0.49} 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▌ | 108/1115 [41:15<7:30:13, 26.83s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6191, 'learning_rate': 6.539999999999999e-05, 'epoch': 0.5} 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6363, 'learning_rate': 6.599999999999999e-05, 'epoch': 0.5} 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▊ | 111/1115 [42:35<7:26:51, 26.70s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6369, 'learning_rate': 6.659999999999999e-05, 'epoch': 0.51} 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5246, 'learning_rate': 6.72e-05, 'epoch': 0.51} 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5943, 'learning_rate': 6.78e-05, 'epoch': 0.52} 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|███████▉ | 113/1115 [43:28<7:25:48, 26.69s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5545, 'learning_rate': 6.84e-05, 'epoch': 0.52} 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5898, 'learning_rate': 6.9e-05, 'epoch': 0.52} 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5237, 'learning_rate': 6.96e-05, 'epoch': 0.53} 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5241, 'learning_rate': 7.02e-05, 'epoch': 0.53} 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5408, 'learning_rate': 7.079999999999999e-05, 'epoch': 0.54} 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4809, 'learning_rate': 7.139999999999999e-05, 'epoch': 0.54} 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5278, 'learning_rate': 7.199999999999999e-05, 'epoch': 0.55} 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4764, 'learning_rate': 7.259999999999999e-05, 'epoch': 0.55} 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4472, 'learning_rate': 7.319999999999999e-05, 'epoch': 0.56} 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 10%|████████ | 116/1115 [44:46<7:14:15, 26.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5971, 'learning_rate': 7.379999999999999e-05, 'epoch': 0.56} [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4813, 'learning_rate': 7.439999999999999e-05, 'epoch': 0.57} [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:27:11,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5371, 'learning_rate': 7.5e-05, 'epoch': 0.57} [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5538, 'learning_rate': 7.56e-05, 'epoch': 0.57} [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5222, 'learning_rate': 7.62e-05, 'epoch': 0.58} [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:27:55,361 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:29:15,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:29:15,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:29:15,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5567, 'learning_rate': 7.68e-05, 'epoch': 0.58} [WARNING|modeling_utils.py:388] 2022-03-25 20:29:15,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:29:15,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:29:15,626 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:29:28,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:29:28,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:29:28,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:29:34,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:29:34,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:29:34,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▏ | 131/1115 [50:50<6:21:26, 23.26s/it]g-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▏ | 131/1115 [50:50<6:21:26, 23.26s/it]g-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5519, 'learning_rate': 7.74e-05, 'epoch': 0.59} 12%|█████████▏ | 131/1115 [50:50<6:21:26, 23.26s/it]g-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▏ | 131/1115 [50:50<6:21:26, 23.26s/it]g-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▏ | 131/1115 [50:50<6:21:26, 23.26s/it]g-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▏ | 131/1115 [50:50<6:21:26, 23.26s/it]g-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▏ | 131/1115 [50:50<6:21:26, 23.26s/it]g-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:29:54,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:29:54,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:29:54,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:29:54,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:03,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:03,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3829, 'learning_rate': 7.8e-05, 'epoch': 0.59} [WARNING|modeling_utils.py:388] 2022-03-25 20:30:03,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:03,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:03,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:03,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:03,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:03,033 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:19,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:19,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:19,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:19,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▎ | 133/1115 [51:35<6:14:12, 22.86s/it]g-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:27,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:27,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:27,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:27,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:27,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:27,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:27,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:27,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:30:44,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:30:44,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:30:44,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4999, 'learning_rate': 7.92e-05, 'epoch': 0.6} [WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:30:50,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5518, 'learning_rate': 7.98e-05, 'epoch': 0.61} [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:10,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:10,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:10,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:10,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:10,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:21,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:21,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:21,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:21,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:21,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.48, 'learning_rate': 8.04e-05, 'epoch': 0.61} [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:21,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:21,068 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:35,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:35,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:35,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:35,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:31:43,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:31:43,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:31:43,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:31:43,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5714, 'learning_rate': 8.1e-05, 'epoch': 0.61} [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:51,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:51,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:51,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:51,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:51,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:31:51,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:32:03,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:32:03,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:32:03,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:17:46,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 138/1115 [53:19<5:42:12, 21.02s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 138/1115 [53:19<5:42:12, 21.02s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4858, 'learning_rate': 8.16e-05, 'epoch': 0.62} [WARNING|modeling_utils.py:388] 2022-03-25 20:32:13,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:32:13,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:32:13,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:32:19,685 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:32:22,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:32:22,016 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:32:26,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 139/1115 [53:38<5:31:59, 20.41s/it] Setting `use_cache=False`...e computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 12%|█████████▋ | 139/1115 [53:38<5:31:59, 20.41s/it] Setting `use_cache=False`...e computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:32:30,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:32:30,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:32:34,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:32:34,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:32:38,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:32:40,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:32:40,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:32:44,450 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:09,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 140/1115 [53:56<5:20:34, 19.73s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 13%|█████████▊ | 140/1115 [53:56<5:20:34, 19.73s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5127, 'learning_rate': 8.28e-05, 'epoch': 0.63} [WARNING|modeling_utils.py:388] 2022-03-25 20:32:50,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:32:52,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:32:54,679 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:32:56,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:32:58,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:00,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:03,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:03,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:05,137 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:07,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:09,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:11,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:12,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:14,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:16,837 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:18,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:18,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:20,689 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:22,544 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:24,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:26,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:28,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:29,760 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:33,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:33,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:35,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:36,822 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:38,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:40,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:41,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:44,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:44,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:47,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:49,042 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:50,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:53,545 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:55,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:57,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:57,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:33:59,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:01,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:03,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:04,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:07,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:07,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:09,602 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:10,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:13,280 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:15,563 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:17,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:17,830 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:18,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:21,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:23,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:25,228 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:25,228 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:27,134 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:29,964 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:31,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:33,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:33,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:35,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:37,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:39,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:39,756 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:41,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:41,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:45,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:45,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:49,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:49,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:52,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:52,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:56,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:59,964 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:34:59,964 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:03,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:03,530 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:07,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:07,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:07,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:10,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:14,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:14,482 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:18,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:18,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:21,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:25,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:25,102 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:28,667 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:28,667 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:32,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:32,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:35,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:35,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:35,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:39,262 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:42,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:42,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:46,345 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:46,345 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:49,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:53,222 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:53,222 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:56,667 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:35:56,667 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:00,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:03,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:03,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:07,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:07,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.3625, 'learning_rate': 9.059999999999999e-05, 'epoch': 0.69} [WARNING|modeling_utils.py:388] 2022-03-25 20:36:10,589 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:14,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:14,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:17,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:17,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:20,814 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:24,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:24,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:27,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:27,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:31,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9468, 'learning_rate': 9.12e-05, 'epoch': 0.69} [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8548, 'learning_rate': 9.18e-05, 'epoch': 0.7} [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7858, 'learning_rate': 9.24e-05, 'epoch': 0.7} [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6516, 'learning_rate': 9.3e-05, 'epoch': 0.7} [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6538, 'learning_rate': 9.36e-05, 'epoch': 0.71} [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5987, 'learning_rate': 9.419999999999999e-05, 'epoch': 0.71} [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6631, 'learning_rate': 9.479999999999999e-05, 'epoch': 0.72} [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:36:34,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5287, 'learning_rate': 9.599999999999999e-05, 'epoch': 0.73} 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5528, 'learning_rate': 9.659999999999999e-05, 'epoch': 0.73} 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5033, 'learning_rate': 9.719999999999999e-05, 'epoch': 0.74} 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 14%|██████████▉ | 161/1115 [1:00:54<6:59:52, 26.41s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5908, 'learning_rate': 9.779999999999999e-05, 'epoch': 0.74} 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4931, 'learning_rate': 9.839999999999999e-05, 'epoch': 0.74} 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.519, 'learning_rate': 9.9e-05, 'epoch': 0.75} 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5015, 'learning_rate': 9.96e-05, 'epoch': 0.75} 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▏ | 165/1115 [1:02:45<7:12:05, 27.29s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3689, 'learning_rate': 0.0001002, 'epoch': 0.76} 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5586, 'learning_rate': 0.0001008, 'epoch': 0.76} 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4859, 'learning_rate': 0.0001014, 'epoch': 0.77} 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 15%|███████████▌ | 169/1115 [1:04:31<7:05:23, 26.98s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3494, 'learning_rate': 0.000102, 'epoch': 0.77} [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4217, 'learning_rate': 0.0001026, 'epoch': 0.78} [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3855, 'learning_rate': 0.00010319999999999999, 'epoch': 0.78} [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4697, 'learning_rate': 0.00010379999999999999, 'epoch': 0.78} [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4196, 'learning_rate': 0.00010439999999999999, 'epoch': 0.79} [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:44:25,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:25,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:25,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:25,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:25,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:25,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:25,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:25,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:25,771 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:46:40,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:03,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:03,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:03,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:03,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:03,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4174, 'learning_rate': 0.00010619999999999998, 'epoch': 0.8} [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:13,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4003, 'learning_rate': 0.00010679999999999998, 'epoch': 0.81} [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5487, 'learning_rate': 0.00010739999999999998, 'epoch': 0.81} [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:47:46,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:48:20,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:48:20,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:48:24,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:48:24,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.35, 'learning_rate': 0.00010799999999999998, 'epoch': 0.82} [WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:48:29,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4523, 'learning_rate': 0.00010919999999999998, 'epoch': 0.83} 16%|████████████▍ | 183/1115 [1:10:06<5:58:04, 23.05s/it]g-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:22,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:22,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:22,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:22,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:22,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:22,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:22,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:37,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:37,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:37,112 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:41,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:41,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:41,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:41,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:41,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:51,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:51,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:51,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:51,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:59,420 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:49:59,420 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3919, 'learning_rate': 0.00011039999999999999, 'epoch': 0.83} [WARNING|modeling_utils.py:388] 2022-03-25 20:50:03,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:03,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:03,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:03,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:03,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:13,600 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:13,600 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:13,600 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:19,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:19,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4118, 'learning_rate': 0.00011099999999999999, 'epoch': 0.84} [WARNING|modeling_utils.py:388] 2022-03-25 20:50:19,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:26,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:26,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:26,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:26,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:34,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:34,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:34,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:40,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:40,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5025, 'learning_rate': 0.00011159999999999999, 'epoch': 0.84} [WARNING|modeling_utils.py:388] 2022-03-25 20:50:44,443 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:44,443 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:44,443 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:50,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:52,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:52,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:52,938 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:58,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:58,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:58,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4032, 'learning_rate': 0.00011219999999999999, 'epoch': 0.85} [WARNING|modeling_utils.py:388] 2022-03-25 20:50:58,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:50:58,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:51:08,441 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:51:10,741 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:51:13,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:51:13,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:51:13,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:51:18,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:51:18,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:51:21,156 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:51:21,156 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:51:25,280 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:51:27,536 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:51:27,536 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:51:31,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:51:33,432 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:51:35,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:51:35,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4694, 'learning_rate': 0.00011339999999999999, 'epoch': 0.86} [WARNING|modeling_bart.py:1051] 2022-03-25 20:51:39,597 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:51:41,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:51:43,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:51:45,888 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:51:47,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:51:50,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:51:52,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:51:52,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:32:46,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████ | 192/1115 [1:13:04<4:46:21, 18.61s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:51:54,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:51:56,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:51:54,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:51:58,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:51:54,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:51:59,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:51:54,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:01,939 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:51:54,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:03,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:51:54,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:05,702 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:51:54,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:07,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:51:54,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:07,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:51:54,105 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████▏ | 193/1115 [1:13:19<4:31:24, 17.66s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:52:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:11,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:13,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:16,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:18,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:18,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:22,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:22,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████▏ | 194/1115 [1:13:34<4:18:51, 16.86s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:52:24,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:26,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:24,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:27,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:24,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:30,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:24,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:32,553 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:24,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:34,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:24,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████▎ | 195/1115 [1:13:47<4:00:14, 15.67s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:52:37,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 17%|█████████████▎ | 195/1115 [1:13:47<4:00:14, 15.67s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:52:37,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:38,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:37,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:40,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:37,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:41,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:37,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:44,448 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:37,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:47,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:37,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▎ | 196/1115 [1:13:58<3:40:18, 14.38s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:52:48,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▎ | 196/1115 [1:13:58<3:40:18, 14.38s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:52:48,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:51,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:48,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:52,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:48,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:54,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:48,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:57,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:48,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:52:57,332 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:48,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▍ | 197/1115 [1:14:08<3:20:17, 13.09s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:52:58,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:00,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:58,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:03,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:58,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:05,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:52:58,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▍ | 198/1115 [1:14:17<3:00:05, 11.78s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:53:07,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▍ | 198/1115 [1:14:17<3:00:05, 11.78s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:53:07,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:09,155 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:07,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:11,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:07,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:12,843 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:07,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:15,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:14,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:15,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:14,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:17,178 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:14,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:19,391 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:14,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 200/1115 [1:14:32<2:24:20, 9.47s/it] Setting `use_cache=False`...1] 2022-03-25 20:53:14,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 200/1115 [1:14:32<2:24:20, 9.47s/it] Setting `use_cache=False`...1] 2022-03-25 20:53:14,725 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 200/1115 [1:14:32<2:24:20, 9.47s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 200/1115 [1:14:32<2:24:20, 9.47s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:26,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:26,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:30,373 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:33,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:33,972 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:37,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:37,543 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:41,116 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:41,116 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:44,670 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:48,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:48,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:48,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:22,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 201/1115 [1:15:01<3:53:55, 15.36s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▋ | 201/1115 [1:15:01<3:53:55, 15.36s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:55,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:58,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:53:58,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:02,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:02,235 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:05,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:09,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:09,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:12,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:12,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:15,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:15,981 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 202/1115 [1:15:29<4:50:10, 19.07s/it] Setting `use_cache=False`...1] 2022-03-25 20:53:51,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 202/1115 [1:15:29<4:50:10, 19.07s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:22,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:22,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:26,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:26,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:29,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:32,983 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:32,983 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:36,372 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:39,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:39,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:43,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:43,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 203/1115 [1:15:56<5:26:12, 21.46s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:19,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 18%|█████████████▊ | 203/1115 [1:15:56<5:26:12, 21.46s/it][WARNING|modeling_bart.py:1051] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:49,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:49,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:53,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:53,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:56,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:59,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:54:59,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9813, 'learning_rate': 0.00012119999999999999, 'epoch': 0.91} [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7145, 'learning_rate': 0.00012179999999999999, 'epoch': 0.92} [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7888, 'learning_rate': 0.0001224, 'epoch': 0.92} [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:55:03,159 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6485, 'learning_rate': 0.00012299999999999998, 'epoch': 0.93} 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6296, 'learning_rate': 0.0001236, 'epoch': 0.93} 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4968, 'learning_rate': 0.00012419999999999998, 'epoch': 0.94} 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.514, 'learning_rate': 0.00012479999999999997, 'epoch': 0.94} 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████ | 207/1115 [1:17:41<6:19:30, 25.08s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:58:05,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 20:58:05,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5182, 'learning_rate': 0.00012539999999999999, 'epoch': 0.95} 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.534, 'learning_rate': 0.00012599999999999997, 'epoch': 0.95} 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▍ | 211/1115 [1:19:19<6:10:13, 24.57s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5318, 'learning_rate': 0.0001266, 'epoch': 0.96} 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4489, 'learning_rate': 0.00012719999999999997, 'epoch': 0.96} 19%|██████████████▌ | 213/1115 [1:20:06<6:03:32, 24.18s/it] Setting `use_cache=False`...1] 2022-03-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:59:24,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:59:24,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:59:24,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:59:24,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:59:24,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:59:24,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3867, 'learning_rate': 0.0001278, 'epoch': 0.96} [WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 20:59:37,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:01,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:01,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:01,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:05,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:05,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:05,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:05,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:05,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:15,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:15,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:15,526 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:21,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:21,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:21,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3823, 'learning_rate': 0.000129, 'epoch': 0.97} [WARNING|modeling_utils.py:388] 2022-03-25 21:00:27,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:27,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:27,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:27,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:00:35,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:00:35,841 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:39,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:41,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:00:41,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3786, 'learning_rate': 0.00012959999999999998, 'epoch': 0.98} [WARNING|modeling_bart.py:1051] 2022-03-25 21:00:46,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:00:48,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:00:50,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:00:52,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:00:54,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:00:54,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:00:58,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 20:54:46,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|██████████████▉ | 219/1115 [1:22:10<4:59:37, 20.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:01:00,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|██████████████▉ | 219/1115 [1:22:10<4:59:37, 20.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:01:00,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:02,247 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:00,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:04,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:00,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:05,918 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:00,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:07,720 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:00,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:09,491 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:00,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:12,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:00,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:12,886 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:00,349 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|██████████████▉ | 220/1115 [1:22:24<4:34:08, 18.38s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:01:14,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:16,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:14,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:19,273 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:14,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:20,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:14,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:23,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:14,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:24,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:14,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:24,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:14,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:27,330 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:26,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:29,610 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:26,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:30,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:26,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:32,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:26,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:32,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:26,124 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▏ | 222/1115 [1:22:45<3:29:46, 14.09s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:01:34,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:37,472 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:34,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:39,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:34,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:39,095 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:34,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▏ | 223/1115 [1:22:51<2:55:41, 11.82s/it] Setting `use_cache=False`...1] 2022-03-25 21:01:34,832 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▏ | 223/1115 [1:22:51<2:55:41, 11.82s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▏ | 223/1115 [1:22:51<2:55:41, 11.82s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:46,333 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:46,333 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:50,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:53,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:53,614 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:57,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:01:57,256 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:00,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:00,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:04,403 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:07,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:07,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:07,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:01:42,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▎ | 224/1115 [1:23:21<4:13:54, 17.10s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▎ | 224/1115 [1:23:21<4:13:54, 17.10s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:15,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:15,169 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:18,640 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:22,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:22,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:25,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:25,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:29,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:29,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:32,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.3673, 'learning_rate': 0.0001338, 'epoch': 1.01} [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.1522, 'learning_rate': 0.0001344, 'epoch': 1.01} [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:02:36,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7286, 'learning_rate': 0.000135, 'epoch': 1.02} 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.6139, 'learning_rate': 0.0001356, 'epoch': 1.02} 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5005, 'learning_rate': 0.0001362, 'epoch': 1.03} 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4463, 'learning_rate': 0.0001368, 'epoch': 1.03} 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4191, 'learning_rate': 0.0001374, 'epoch': 1.04} 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3203, 'learning_rate': 0.000138, 'epoch': 1.04} 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4397, 'learning_rate': 0.0001386, 'epoch': 1.04} 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3437, 'learning_rate': 0.0001392, 'epoch': 1.05} 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3979, 'learning_rate': 0.00013979999999999998, 'epoch': 1.05} 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 20%|███████████████▍ | 227/1115 [1:24:45<5:57:17, 24.14s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4071, 'learning_rate': 0.0001404, 'epoch': 1.06} 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3001, 'learning_rate': 0.00014099999999999998, 'epoch': 1.06} 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████ | 236/1115 [1:28:43<6:20:51, 26.00s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.36, 'learning_rate': 0.00014159999999999997, 'epoch': 1.07} 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3493, 'learning_rate': 0.0001422, 'epoch': 1.07} 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4577, 'learning_rate': 0.00014279999999999997, 'epoch': 1.08} 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2136, 'learning_rate': 0.0001434, 'epoch': 1.08} 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2935, 'learning_rate': 0.00014399999999999998, 'epoch': 1.09} 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 21%|████████████████▏ | 238/1115 [1:29:35<6:19:04, 25.93s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2193, 'learning_rate': 0.0001446, 'epoch': 1.09} 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3509, 'learning_rate': 0.00014519999999999998, 'epoch': 1.09} 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▌ | 243/1115 [1:31:40<6:04:57, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2148, 'learning_rate': 0.0001458, 'epoch': 1.1} [WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:11:11,652 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2056, 'learning_rate': 0.00014639999999999998, 'epoch': 1.1} [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2163, 'learning_rate': 0.000147, 'epoch': 1.11} [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:11:31,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2646, 'learning_rate': 0.00014759999999999998, 'epoch': 1.11} 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1475, 'learning_rate': 0.0001482, 'epoch': 1.12} 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1661, 'learning_rate': 0.00014879999999999998, 'epoch': 1.12} 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 22%|████████████████▉ | 248/1115 [1:33:42<5:50:14, 24.24s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0668, 'learning_rate': 0.0001494, 'epoch': 1.13} 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████ | 251/1115 [1:34:53<5:43:34, 23.86s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▏ | 252/1115 [1:35:16<5:39:18, 23.59s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▏ | 252/1115 [1:35:16<5:39:18, 23.59s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0718, 'learning_rate': 0.00015, 'epoch': 1.13} 23%|█████████████████▏ | 252/1115 [1:35:16<5:39:18, 23.59s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▏ | 252/1115 [1:35:16<5:39:18, 23.59s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▏ | 252/1115 [1:35:16<5:39:18, 23.59s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▏ | 252/1115 [1:35:16<5:39:18, 23.59s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▏ | 252/1115 [1:35:16<5:39:18, 23.59s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1521, 'learning_rate': 0.00015059999999999997, 'epoch': 1.13} [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0864, 'learning_rate': 0.0001512, 'epoch': 1.14} [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:14:19,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▍ | 255/1115 [1:36:23<5:26:32, 22.78s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▍ | 255/1115 [1:36:23<5:26:32, 22.78s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0792, 'learning_rate': 0.00015179999999999998, 'epoch': 1.14} [WARNING|modeling_utils.py:388] 2022-03-25 21:15:17,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:17,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:17,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:17,101 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:25,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:25,310 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:29,454 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:29,454 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:33,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:33,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:33,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0559, 'learning_rate': 0.0001524, 'epoch': 1.15} [WARNING|modeling_utils.py:388] 2022-03-25 21:15:33,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:33,484 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:43,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:43,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:43,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:43,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:43,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:43,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:43,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:43,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:43,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:59,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:59,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:59,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:59,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:59,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:59,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:15:59,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:16:14,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:16:14,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 258/1115 [1:37:28<5:14:24, 22.01s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▌ | 258/1115 [1:37:28<5:14:24, 22.01s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1454, 'learning_rate': 0.0001536, 'epoch': 1.16} 23%|█████████████████▌ | 258/1115 [1:37:28<5:14:24, 22.01s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:16:24,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:16:24,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:16:24,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:16:24,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:16:24,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:16:24,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:16:36,897 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▋ | 259/1115 [1:37:49<5:08:03, 21.59s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▋ | 259/1115 [1:37:49<5:08:03, 21.59s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0752, 'learning_rate': 0.00015419999999999998, 'epoch': 1.16} 23%|█████████████████▋ | 259/1115 [1:37:49<5:08:03, 21.59s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:16:45,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:16:45,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:16:45,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:16:50,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:16:50,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:16:55,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:16:55,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▋ | 260/1115 [1:38:09<5:02:00, 21.19s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 23%|█████████████████▋ | 260/1115 [1:38:09<5:02:00, 21.19s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0471, 'learning_rate': 0.0001548, 'epoch': 1.17} [WARNING|modeling_bart.py:1051] 2022-03-25 21:17:03,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:17:03,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:17:03,433 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:17:09,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:17:09,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:17:13,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:17:13,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:17:13,559 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:17:19,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:17:19,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1324, 'learning_rate': 0.00015539999999999998, 'epoch': 1.17} [WARNING|modeling_utils.py:388] 2022-03-25 21:17:19,650 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:17:25,516 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:17:27,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:17:27,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:17:32,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:17:32,083 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:17:36,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:17:36,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0548, 'learning_rate': 0.000156, 'epoch': 1.17} [WARNING|modeling_bart.py:1051] 2022-03-25 21:17:40,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:17:40,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:17:44,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:17:44,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:17:44,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:17:50,194 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:17:52,397 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:17:54,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:17:54,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 24%|█████████████████▉ | 263/1115 [1:39:07<4:40:52, 19.78s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:17:58,508 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:18:00,663 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:18:02,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:18:04,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:18:07,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:18:09,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:18:11,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:18:13,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:18:13,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1022, 'learning_rate': 0.0001572, 'epoch': 1.18} [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:17,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:19,321 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:21,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:23,413 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:25,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:27,308 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:29,244 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:29,244 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:31,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:33,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:35,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:36,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:38,790 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:40,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:42,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:46,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:46,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:47,881 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:49,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:51,325 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:53,011 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:56,302 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:57,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:57,924 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:18:59,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:01,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:02,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:05,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:07,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:10,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:10,085 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:11,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:14,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:15,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:18,180 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:20,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:21,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:21,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:24,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:26,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:27,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:29,935 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:32,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:32,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:34,289 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:36,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:38,258 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:40,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:40,226 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:42,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:44,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:46,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:46,377 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:48,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:50,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:52,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:52,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:52,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:56,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:56,001 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:59,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:19:59,625 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:03,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:06,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:06,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:10,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:10,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:14,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:14,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:17,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:21,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:21,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:21,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:24,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:24,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:28,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:28,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:31,870 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:35,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:35,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:38,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:38,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:42,341 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:45,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:45,815 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:49,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:49,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.3449, 'learning_rate': 0.0001638, 'epoch': 1.23} [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:53,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:53,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:57,442 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:20:57,442 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:00,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:00,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:04,308 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:07,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:07,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:11,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:14,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:14,653 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:18,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:18,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9906, 'learning_rate': 0.0001644, 'epoch': 1.24} [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:21,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:21,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:24,882 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:28,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:28,277 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:31,678 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:35,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:35,073 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:38,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:38,473 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:41,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7723, 'learning_rate': 0.000165, 'epoch': 1.24} [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5841, 'learning_rate': 0.0001656, 'epoch': 1.25} [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5655, 'learning_rate': 0.0001662, 'epoch': 1.25} [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5075, 'learning_rate': 0.0001668, 'epoch': 1.26} [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3671, 'learning_rate': 0.0001674, 'epoch': 1.26} [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4138, 'learning_rate': 0.000168, 'epoch': 1.26} [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:21:45,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3305, 'learning_rate': 0.0001686, 'epoch': 1.27} Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2737, 'learning_rate': 0.00016919999999999997, 'epoch': 1.27} Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3387, 'learning_rate': 0.00016979999999999998, 'epoch': 1.28} 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.218, 'learning_rate': 0.00017039999999999997, 'epoch': 1.28} 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2599, 'learning_rate': 0.00017099999999999998, 'epoch': 1.29} 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▍ | 285/1115 [1:46:28<5:59:26, 25.98s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2436, 'learning_rate': 0.00017159999999999997, 'epoch': 1.29} 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 26%|███████████████████▋ | 288/1115 [1:47:46<5:57:13, 25.92s/it] Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1241, 'learning_rate': 0.00017219999999999998, 'epoch': 1.3} Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2546, 'learning_rate': 0.00017279999999999997, 'epoch': 1.3} Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1435, 'learning_rate': 0.00017339999999999996, 'epoch': 1.3} Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0901, 'learning_rate': 0.00017399999999999997, 'epoch': 1.31} Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1774, 'learning_rate': 0.00017459999999999996, 'epoch': 1.31} Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1511, 'learning_rate': 0.00017519999999999998, 'epoch': 1.32} Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1222, 'learning_rate': 0.00017579999999999996, 'epoch': 1.32} [WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:29:24,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0986, 'learning_rate': 0.00017639999999999998, 'epoch': 1.33} 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0735, 'learning_rate': 0.00017699999999999997, 'epoch': 1.33} 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▏ | 296/1115 [1:51:05<5:37:39, 24.74s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:30:27,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:30:27,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:30:27,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:30:27,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:30:27,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:30:27,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0722, 'learning_rate': 0.00017759999999999998, 'epoch': 1.34} [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.162, 'learning_rate': 0.00017819999999999997, 'epoch': 1.34} [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1414, 'learning_rate': 0.00017879999999999998, 'epoch': 1.35} [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:30:40,566 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:31:43,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:31:43,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:31:43,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:31:43,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:31:43,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 301/1115 [1:53:04<5:24:12, 23.90s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 301/1115 [1:53:04<5:24:12, 23.90s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1157, 'learning_rate': 0.00017939999999999997, 'epoch': 1.35} 27%|████████████████████▌ | 301/1115 [1:53:04<5:24:12, 23.90s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 301/1115 [1:53:04<5:24:12, 23.90s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 301/1115 [1:53:04<5:24:12, 23.90s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 301/1115 [1:53:04<5:24:12, 23.90s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 301/1115 [1:53:04<5:24:12, 23.90s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 301/1115 [1:53:04<5:24:12, 23.90s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▌ | 301/1115 [1:53:04<5:24:12, 23.90s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:12,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:12,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:12,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:12,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0798, 'learning_rate': 0.00017999999999999998, 'epoch': 1.35} [WARNING|modeling_utils.py:388] 2022-03-25 21:32:12,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:12,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:12,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:26,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:26,840 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1804, 'learning_rate': 0.00018059999999999997, 'epoch': 1.36} [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0958, 'learning_rate': 0.00018119999999999999, 'epoch': 1.36} [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0506, 'learning_rate': 0.00018179999999999997, 'epoch': 1.37} [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:32:31,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:33:34,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:33:34,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:33:34,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:33:34,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:33:42,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:33:42,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▊ | 306/1115 [1:54:57<5:05:16, 22.64s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 27%|████████████████████▊ | 306/1115 [1:54:57<5:05:16, 22.64s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1091, 'learning_rate': 0.0001824, 'epoch': 1.37} [WARNING|modeling_utils.py:388] 2022-03-25 21:33:50,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:33:50,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:33:50,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:33:56,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:33:56,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:33:56,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:33:56,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:33:56,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:33:56,917 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 307/1115 [1:55:19<5:03:12, 22.52s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|████████████████████▉ | 307/1115 [1:55:19<5:03:12, 22.52s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1218, 'learning_rate': 0.00018299999999999998, 'epoch': 1.38} 28%|████████████████████▉ | 307/1115 [1:55:19<5:03:12, 22.52s/it]g-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:15,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:15,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:15,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:15,557 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:23,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:23,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:23,386 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:29,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:29,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1636, 'learning_rate': 0.0001836, 'epoch': 1.38} [WARNING|modeling_utils.py:388] 2022-03-25 21:34:33,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:33,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:33,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:40,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:40,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:40,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:40,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:47,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:47,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:47,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.08, 'learning_rate': 0.00018419999999999998, 'epoch': 1.39} [WARNING|modeling_utils.py:388] 2022-03-25 21:34:47,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:47,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:58,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:58,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:34:58,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:04,423 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:04,423 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:04,423 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:10,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:10,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0701, 'learning_rate': 0.0001848, 'epoch': 1.39} [WARNING|modeling_utils.py:388] 2022-03-25 21:35:10,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:16,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:16,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:16,904 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:22,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:22,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:22,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:22,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:02:11,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████▏ | 311/1115 [1:56:41<4:37:29, 20.71s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████▏ | 311/1115 [1:56:41<4:37:29, 20.71s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9545, 'learning_rate': 0.00018539999999999998, 'epoch': 1.39} 28%|█████████████████████▏ | 311/1115 [1:56:41<4:37:29, 20.71s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:35:37,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:35:37,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:41,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:41,006 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:35:45,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:35:45,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:49,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:49,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:51,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:51,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:51,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:57,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:35:57,350 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:36:01,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:36:03,755 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:36:05,979 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:36:08,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:36:08,158 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0669, 'learning_rate': 0.00018659999999999998, 'epoch': 1.4} [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:12,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:12,255 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:36:15,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:36:18,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:36:18,012 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:21,861 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:23,928 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:23,928 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:35:31,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████▍ | 314/1115 [1:57:36<4:14:00, 19.03s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:36:26,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:28,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:26,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:30,125 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:26,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:32,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:26,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:34,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:26,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:36,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:26,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:37,979 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:26,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:39,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:26,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:39,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:26,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████▍ | 315/1115 [1:57:52<4:01:14, 18.09s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:36:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:43,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:45,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:47,514 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:49,304 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:52,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:54,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:54,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:41,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████▌ | 316/1115 [1:58:06<3:46:57, 17.04s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:36:56,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:58,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:56,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:36:59,848 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:56,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:01,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:56,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:04,812 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:56,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:06,429 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:36:56,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████▌ | 317/1115 [1:58:19<3:31:42, 15.92s/it] Setting `use_cache=False`...1] 2022-03-25 21:36:56,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 28%|█████████████████████▌ | 317/1115 [1:58:19<3:31:42, 15.92s/it] Setting `use_cache=False`...1] 2022-03-25 21:36:56,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:11,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:09,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:12,831 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:09,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:14,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:09,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:17,313 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:09,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:18,775 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:09,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:20,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:09,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:20,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:09,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:23,137 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:21,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:24,509 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:21,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:27,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:21,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:28,501 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:21,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:30,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:21,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|█████████████████████▋ | 319/1115 [1:58:43<3:02:46, 13.78s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:37:33,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|█████████████████████▋ | 319/1115 [1:58:43<3:02:46, 13.78s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:37:33,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:34,493 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:33,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:36,947 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:33,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:39,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:33,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:40,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:33,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|█████████████████████▊ | 320/1115 [1:58:53<2:46:26, 12.56s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:37:42,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|█████████████████████▊ | 320/1115 [1:58:53<2:46:26, 12.56s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:37:42,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:45,037 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:42,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:47,153 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:42,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:49,204 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:42,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|█████████████████████▉ | 321/1115 [1:59:01<2:30:08, 11.35s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:37:51,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|█████████████████████▉ | 321/1115 [1:59:01<2:30:08, 11.35s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:37:51,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:53,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:51,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:55,093 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:51,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:57,720 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:51,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:57,720 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:51,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:37:59,487 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:58,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:01,096 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:58,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:03,316 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:58,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:03,316 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:37:58,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 323/1115 [1:59:15<1:58:27, 8.97s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 323/1115 [1:59:15<1:58:27, 8.97s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 323/1115 [1:59:15<1:58:27, 8.97s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:09,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:13,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:13,374 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:17,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:17,007 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:20,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:24,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:24,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:27,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:27,756 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:31,327 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 324/1115 [1:59:44<3:18:14, 15.04s/it] Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 324/1115 [1:59:44<3:18:14, 15.04s/it] Setting `use_cache=False`...1] 2022-03-25 21:38:05,887 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 324/1115 [1:59:44<3:18:14, 15.04s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████ | 324/1115 [1:59:44<3:18:14, 15.04s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:38,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:42,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:42,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:45,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:45,497 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:49,026 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:52,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:52,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:55,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:55,914 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:59,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:59,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:38:59,452 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:38:34,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▏ | 325/1115 [2:00:13<4:13:33, 19.26s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▏ | 325/1115 [2:00:13<4:13:33, 19.26s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:07,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:11,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:11,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:14,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:14,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:17,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:21,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:21,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:24,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:24,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:28,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▏ | 326/1115 [2:00:41<4:47:04, 21.83s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▏ | 326/1115 [2:00:41<4:47:04, 21.83s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:04,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▏ | 326/1115 [2:00:41<4:47:04, 21.83s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:35,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:35,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:38,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:41,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:41,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:45,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:45,312 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:48,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:52,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:52,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:39:55,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9718, 'learning_rate': 0.000195, 'epoch': 1.47} 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.8634, 'learning_rate': 0.00019559999999999998, 'epoch': 1.47} 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.2309, 'learning_rate': 0.00019559999999999998, 'epoch': 1.48} 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.1864, 'learning_rate': 0.0001962, 'epoch': 1.48} 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 7.7546, 'learning_rate': 0.00019679999999999999, 'epoch': 1.48} 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 7.6626, 'learning_rate': 0.0001974, 'epoch': 1.49} 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 7.7001, 'learning_rate': 0.000198, 'epoch': 1.49} 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 7.4637, 'learning_rate': 0.0001986, 'epoch': 1.5} 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 29%|██████████████████████▎ | 327/1115 [2:01:08<5:08:00, 23.45s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 7.272, 'learning_rate': 0.0001992, 'epoch': 1.5} 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 7.0608, 'learning_rate': 0.0001998, 'epoch': 1.51} 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.811, 'learning_rate': 0.0002004, 'epoch': 1.51} 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.8756, 'learning_rate': 0.000201, 'epoch': 1.52} 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 6.0821, 'learning_rate': 0.0002016, 'epoch': 1.52} 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.2635, 'learning_rate': 0.0002022, 'epoch': 1.52} 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.1346, 'learning_rate': 0.0002028, 'epoch': 1.53} 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 30%|██████████████████████▊ | 335/1115 [2:04:41<5:39:40, 26.13s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9469, 'learning_rate': 0.00020339999999999998, 'epoch': 1.53} 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▎ | 342/1115 [2:07:40<5:26:12, 25.32s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.7854, 'learning_rate': 0.000204, 'epoch': 1.54} 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4968, 'learning_rate': 0.00020459999999999999, 'epoch': 1.54} 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5434, 'learning_rate': 0.0002052, 'epoch': 1.55} 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▍ | 343/1115 [2:08:05<5:23:36, 25.15s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4593, 'learning_rate': 0.0002058, 'epoch': 1.55} 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4982, 'learning_rate': 0.00020639999999999998, 'epoch': 1.56} 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5314, 'learning_rate': 0.00020699999999999996, 'epoch': 1.56} 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▌ | 346/1115 [2:09:19<5:17:47, 24.80s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4604, 'learning_rate': 0.00020759999999999998, 'epoch': 1.57} 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3669, 'learning_rate': 0.00020819999999999996, 'epoch': 1.57} 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2684, 'learning_rate': 0.00020879999999999998, 'epoch': 1.57} 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 31%|███████████████████████▊ | 349/1115 [2:10:30<5:07:36, 24.10s/it] Setting `use_cache=False`...1] 2022-03-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3108, 'learning_rate': 0.00020939999999999997, 'epoch': 1.58} [WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:23,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3765, 'learning_rate': 0.00020999999999999998, 'epoch': 1.58} [WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5628, 'learning_rate': 0.00021059999999999997, 'epoch': 1.59} [WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:50:46,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:51:23,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:51:23,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:51:23,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:51:23,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:51:23,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:51:23,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:51:23,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:51:23,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:51:23,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3952, 'learning_rate': 0.00021119999999999996, 'epoch': 1.59} [WARNING|modeling_bart.py:1051] 2022-03-25 21:51:42,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:51:42,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:51:42,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:51:42,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:51:42,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:51:42,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:51:42,156 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:51:56,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:51:56,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:00,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:00,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2404, 'learning_rate': 0.00021179999999999997, 'epoch': 1.6} [WARNING|modeling_utils.py:388] 2022-03-25 21:52:04,703 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:04,703 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:04,703 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:10,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:10,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:14,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:14,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:14,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:14,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▎ | 357/1115 [2:13:33<4:44:46, 22.54s/it]g-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▎ | 357/1115 [2:13:33<4:44:46, 22.54s/it]g-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1993, 'learning_rate': 0.00021239999999999996, 'epoch': 1.6} 32%|████████████████████████▎ | 357/1115 [2:13:33<4:44:46, 22.54s/it]g-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:29,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:29,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:29,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:29,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:29,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:39,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:39,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▍ | 358/1115 [2:13:54<4:38:15, 22.06s/it]g-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▍ | 358/1115 [2:13:54<4:38:15, 22.06s/it]g-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.24, 'learning_rate': 0.00021299999999999997, 'epoch': 1.61} 32%|████████████████████████▍ | 358/1115 [2:13:54<4:38:15, 22.06s/it]g-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:50,213 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:50,213 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:54,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:54,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:54,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:52:54,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:53:02,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:53:02,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:53:02,624 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:53:06,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:53:06,511 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:10,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:10,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:10,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:10,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:53:19,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:53:19,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:53:19,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▌ | 360/1115 [2:14:35<4:26:32, 21.18s/it] Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▌ | 360/1115 [2:14:35<4:26:32, 21.18s/it] Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:27,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:27,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:27,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:27,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:27,043 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:36,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:36,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:36,901 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:42,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:42,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2105, 'learning_rate': 0.00021479999999999996, 'epoch': 1.62} [WARNING|modeling_bart.py:1051] 2022-03-25 21:53:47,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:53:47,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:51,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:51,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:51,377 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:57,238 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:59,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:53:59,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▋ | 362/1115 [2:15:13<4:13:27, 20.20s/it]g-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 32%|████████████████████████▋ | 362/1115 [2:15:13<4:13:27, 20.20s/it]g-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:54:05,537 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:54:07,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:54:07,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:11,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:11,948 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:16,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:16,140 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:54:19,883 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:54:22,117 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:54:22,117 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1486, 'learning_rate': 0.00021599999999999996, 'epoch': 1.63} [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:26,286 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:28,413 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:30,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:32,654 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:34,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:36,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:36,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:54:40,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 21:54:40,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0632, 'learning_rate': 0.00021659999999999998, 'epoch': 1.63} [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:44,109 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:46,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:48,128 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:50,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:52,010 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:53,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:39:31,821 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1115 [2:16:06<3:46:30, 18.12s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:54:55,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 365/1115 [2:16:06<3:46:30, 18.12s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:54:55,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:57,855 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:54:55,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:54:59,732 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:54:55,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:01,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:54:55,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:03,405 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:54:55,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:05,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:54:55,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:07,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:54:55,961 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 366/1115 [2:16:20<3:33:45, 17.12s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:55:10,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|████████████████████████▉ | 366/1115 [2:16:20<3:33:45, 17.12s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:55:10,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:12,402 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:10,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:14,122 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:10,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:15,806 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:10,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:17,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:10,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:20,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:10,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:22,381 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:10,685 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████ | 367/1115 [2:16:34<3:19:42, 16.02s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:55:24,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████ | 367/1115 [2:16:34<3:19:42, 16.02s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:55:24,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:27,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:24,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:28,763 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:24,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:30,287 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:24,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:31,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:24,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:34,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:24,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:34,839 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:24,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████ | 368/1115 [2:16:46<3:05:53, 14.93s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:55:36,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:37,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:36,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:40,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:36,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:41,971 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:36,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:43,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:36,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 369/1115 [2:16:58<2:53:23, 13.95s/it] Setting `use_cache=False`...1] 2022-03-25 21:55:36,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 369/1115 [2:16:58<2:53:23, 13.95s/it] Setting `use_cache=False`...1] 2022-03-25 21:55:36,427 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:49,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:48,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:50,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:48,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:52,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:48,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:55,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:48,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:55,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:48,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▏ | 370/1115 [2:17:07<2:37:05, 12.65s/it] Setting `use_cache=False`...1] 2022-03-25 21:55:48,069 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:55:58,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:57,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:00,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:57,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:02,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:57,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:02,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:55:57,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▎ | 371/1115 [2:17:16<2:20:07, 11.30s/it] Setting `use_cache=False`...1] 2022-03-25 21:55:57,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:07,496 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:05,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:09,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:05,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:11,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:05,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:11,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:05,689 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:13,628 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:12,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:15,240 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:12,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:17,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:12,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:17,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:12,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 373/1115 [2:17:29<1:49:44, 8.87s/it] Setting `use_cache=False`...1] 2022-03-25 21:56:12,810 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 33%|█████████████████████████▍ | 373/1115 [2:17:29<1:49:44, 8.87s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:23,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:23,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:27,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:27,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:30,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:30,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:34,329 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:37,918 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:37,918 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:41,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:41,435 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:45,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:45,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▍ | 374/1115 [2:17:58<3:03:33, 14.86s/it] Setting `use_cache=False`...1] 2022-03-25 21:56:19,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▍ | 374/1115 [2:17:58<3:03:33, 14.86s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:52,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:52,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:55,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:55,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:56:59,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:02,779 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:02,779 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:06,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:06,232 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:09,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:09,712 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:13,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 375/1115 [2:18:27<3:55:51, 19.12s/it] Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 375/1115 [2:18:27<3:55:51, 19.12s/it] Setting `use_cache=False`...1] 2022-03-25 21:56:48,660 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▌ | 375/1115 [2:18:27<3:55:51, 19.12s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:21,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:21,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:24,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:24,667 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:28,138 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:31,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:31,571 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:35,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:35,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:38,471 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:41,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:41,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:41,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:17,754 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 376/1115 [2:18:55<4:27:15, 21.70s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 376/1115 [2:18:55<4:27:15, 21.70s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:48,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:48,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:52,118 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:55,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:55,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:57:58,881 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:58:02,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:58:02,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:58:05,599 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:58:05,599 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 21:58:08,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 21:57:45,385 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.525, 'learning_rate': 0.0002238, 'epoch': 1.69} 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4688, 'learning_rate': 0.00022439999999999998, 'epoch': 1.7} 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3824, 'learning_rate': 0.000225, 'epoch': 1.7} 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4258, 'learning_rate': 0.00022559999999999998, 'epoch': 1.7} 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.4924, 'learning_rate': 0.00022619999999999997, 'epoch': 1.71} 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3423, 'learning_rate': 0.00022679999999999998, 'epoch': 1.71} 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.439, 'learning_rate': 0.00022739999999999997, 'epoch': 1.72} 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2055, 'learning_rate': 0.00022799999999999999, 'epoch': 1.72} 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1395, 'learning_rate': 0.00022859999999999997, 'epoch': 1.73} 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1919, 'learning_rate': 0.0002292, 'epoch': 1.73} 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1744, 'learning_rate': 0.00022979999999999997, 'epoch': 1.74} 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 34%|█████████████████████████▋ | 377/1115 [2:19:22<4:46:41, 23.31s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1852, 'learning_rate': 0.0002304, 'epoch': 1.74} 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▍ | 388/1115 [2:24:12<5:15:42, 26.06s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1599, 'learning_rate': 0.00023099999999999998, 'epoch': 1.74} 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1255, 'learning_rate': 0.0002316, 'epoch': 1.75} 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1317, 'learning_rate': 0.00023219999999999998, 'epoch': 1.75} 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1679, 'learning_rate': 0.0002328, 'epoch': 1.76} 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1555, 'learning_rate': 0.00023339999999999998, 'epoch': 1.76} 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▌ | 389/1115 [2:24:37<5:12:36, 25.84s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▊ | 394/1115 [2:26:43<5:03:15, 25.24s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▊ | 394/1115 [2:26:43<5:03:15, 25.24s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1475, 'learning_rate': 0.000234, 'epoch': 1.77} 35%|██████████████████████████▊ | 394/1115 [2:26:43<5:03:15, 25.24s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▊ | 394/1115 [2:26:43<5:03:15, 25.24s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▊ | 394/1115 [2:26:43<5:03:15, 25.24s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▊ | 394/1115 [2:26:43<5:03:15, 25.24s/it][WARNING|modeling_bart.py:1051] 2022-03-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:05:44,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:05:44,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:05:48,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:05:48,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:05:48,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:05:48,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1145, 'learning_rate': 0.00023459999999999998, 'epoch': 1.77} 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0143, 'learning_rate': 0.0002352, 'epoch': 1.78} 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1586, 'learning_rate': 0.00023579999999999999, 'epoch': 1.78} 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 35%|██████████████████████████▉ | 395/1115 [2:27:07<4:59:47, 24.98s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0736, 'learning_rate': 0.0002364, 'epoch': 1.78} 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1268, 'learning_rate': 0.000237, 'epoch': 1.79} 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▏ | 398/1115 [2:28:19<4:50:38, 24.32s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1166, 'learning_rate': 0.0002376, 'epoch': 1.79} [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0384, 'learning_rate': 0.0002382, 'epoch': 1.8} [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:07:54,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▍ | 402/1115 [2:29:53<4:41:12, 23.66s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 36%|███████████████████████████▍ | 402/1115 [2:29:53<4:41:12, 23.66s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9957, 'learning_rate': 0.0002388, 'epoch': 1.8} [WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0793, 'learning_rate': 0.0002394, 'epoch': 1.81} [WARNING|modeling_utils.py:388] 2022-03-25 22:08:47,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1288, 'learning_rate': 0.00023999999999999998, 'epoch': 1.81} [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9214, 'learning_rate': 0.0002406, 'epoch': 1.82} [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:09:11,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:10:11,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:10:11,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:10:11,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0041, 'learning_rate': 0.00024119999999999998, 'epoch': 1.82} [WARNING|modeling_bart.py:1051] 2022-03-25 22:10:11,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:10:11,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:10:11,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:10:11,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:10:11,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:10:11,017 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:10:29,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:10:29,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:10:29,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▋ | 407/1115 [2:31:46<4:26:14, 22.56s/it] Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▋ | 407/1115 [2:31:46<4:26:14, 22.56s/it] Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0388, 'learning_rate': 0.0002418, 'epoch': 1.83} 37%|███████████████████████████▋ | 407/1115 [2:31:46<4:26:14, 22.56s/it] Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▋ | 407/1115 [2:31:46<4:26:14, 22.56s/it] Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 37%|███████████████████████████▋ | 407/1115 [2:31:46<4:26:14, 22.56s/it] Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:10:45,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:10:45,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:10:45,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:10:45,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:10:53,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:10:53,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:10:53,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:10:53,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.989, 'learning_rate': 0.00024239999999999998, 'epoch': 1.83} [WARNING|modeling_utils.py:388] 2022-03-25 22:10:53,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:10:53,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:10:53,706 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:11:08,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:11:08,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:11:08,071 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:11:14,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:11:14,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:11:14,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0942, 'learning_rate': 0.000243, 'epoch': 1.83} [WARNING|modeling_utils.py:388] 2022-03-25 22:11:14,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:11:14,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:11:14,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:11:14,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:11:28,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:11:28,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:11:28,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:11:28,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:11:36,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:11:36,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9554, 'learning_rate': 0.00024359999999999999, 'epoch': 1.84} [WARNING|modeling_bart.py:1051] 2022-03-25 22:11:36,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:11:43,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:11:43,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:11:46,853 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:11:46,853 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:11:51,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:11:51,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:11:51,039 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:11:57,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:11:57,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0312, 'learning_rate': 0.00024419999999999997, 'epoch': 1.84} [WARNING|modeling_utils.py:388] 2022-03-25 22:11:57,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:12:03,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:12:05,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:12:05,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:12:05,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:12:11,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:12:11,395 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:12:15,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:12:15,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9571, 'learning_rate': 0.0002448, 'epoch': 1.85} [WARNING|modeling_utils.py:388] 2022-03-25 22:12:19,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:12:19,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:12:23,852 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:12:23,852 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:12:28,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:12:28,209 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:12:31,950 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:12:34,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:12:34,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:12:34,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:12:38,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:12:40,447 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:12:42,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:12:44,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:12:44,797 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:12:48,425 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:12:50,539 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:12:52,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:12:52,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:12:54,862 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:12:56,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:12:58,863 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:00,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:02,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:04,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:06,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:06,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:08,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:10,525 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:12,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:14,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:16,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:17,937 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:19,733 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:23,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:23,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:25,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:26,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:28,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:30,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:32,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:35,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:35,399 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:37,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:38,725 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:40,305 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:43,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:44,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:46,463 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:49,401 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:49,401 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:50,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:52,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:55,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:56,430 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:13:58,579 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:01,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:01,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:02,542 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:05,001 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:07,380 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:09,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:09,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:11,986 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:13,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:15,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:17,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:17,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:19,177 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:22,059 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:23,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:25,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:25,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:28,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:29,798 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:31,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:31,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:32,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:32,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:36,459 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:40,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:40,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:43,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:43,567 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:47,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:47,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:50,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:54,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:54,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:57,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:57,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:14:57,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:01,097 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:04,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:04,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:08,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:08,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:11,565 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:11,565 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:15,042 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:18,449 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:18,449 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:21,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:21,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:25,268 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:25,268 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:29,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:29,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:33,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:33,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:36,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:36,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:39,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:43,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:43,299 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:46,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:49,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:49,973 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:53,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:53,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:53,319 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:15:56,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:00,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:00,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:03,403 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:03,403 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:06,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:10,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:10,095 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:13,366 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:16,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:16,632 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:19,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:19,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:19,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:23,277 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:16:26,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5779, 'learning_rate': 0.00025439999999999995, 'epoch': 1.92} 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5219, 'learning_rate': 0.00025499999999999996, 'epoch': 1.92} 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 38%|█████████████████████████████▏ | 428/1115 [2:38:00<4:33:56, 23.92s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2696, 'learning_rate': 0.0002556, 'epoch': 1.93} 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1901, 'learning_rate': 0.0002562, 'epoch': 1.93} 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▎ | 430/1115 [2:38:51<4:42:40, 24.76s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1369, 'learning_rate': 0.00025679999999999995, 'epoch': 1.94} 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▍ | 432/1115 [2:39:42<4:46:55, 25.21s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1342, 'learning_rate': 0.000258, 'epoch': 1.95} g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1208, 'learning_rate': 0.0002586, 'epoch': 1.95} g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:20:10,834 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0492, 'learning_rate': 0.00025979999999999997, 'epoch': 1.96} 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▊ | 437/1115 [2:41:41<4:27:51, 23.70s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▊ | 438/1115 [2:42:04<4:25:05, 23.49s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 39%|█████████████████████████████▊ | 438/1115 [2:42:04<4:25:05, 23.49s/it]g-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:20:57,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:20:57,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:20:57,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:20:57,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:20:57,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:20:57,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:20:57,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:20:57,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:21:13,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:21:13,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9223, 'learning_rate': 0.000261, 'epoch': 1.97} [WARNING|modeling_utils.py:388] 2022-03-25 22:21:13,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:21:13,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:21:13,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:21:24,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:21:24,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:21:24,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:21:24,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:21:24,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:21:34,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:21:34,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9732, 'learning_rate': 0.00026159999999999996, 'epoch': 1.97} [WARNING|modeling_utils.py:388] 2022-03-25 22:21:34,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:21:34,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:21:42,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:21:42,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:21:42,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:21:42,765 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:21:50,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:21:50,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:21:54,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:21:54,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9046, 'learning_rate': 0.0002622, 'epoch': 1.98} [WARNING|modeling_utils.py:388] 2022-03-25 22:21:58,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:22:00,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:22:03,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:22:05,213 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:22:05,213 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:08,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:11,050 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 21:58:12,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▏ | 442/1115 [2:43:23<3:46:14, 20.17s/it][WARNING|modeling_bart.py:1051] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▏ | 442/1115 [2:43:23<3:46:14, 20.17s/it][WARNING|modeling_bart.py:1051] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:15,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:17,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:19,016 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:20,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:22,739 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:24,506 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▏ | 443/1115 [2:43:38<3:28:17, 18.60s/it] Setting `use_cache=False`...1] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▏ | 443/1115 [2:43:38<3:28:17, 18.60s/it] Setting `use_cache=False`...1] 2022-03-25 22:22:13,188 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:29,645 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:28,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:31,229 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:28,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:32,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:28,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:35,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:28,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:35,793 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:28,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:39,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:28,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:39,279 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:28,008 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▎ | 444/1115 [2:43:51<3:08:57, 16.90s/it][WARNING|modeling_bart.py:1051] 2022-03-25 22:22:40,879 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:43,431 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:40,879 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:45,773 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:40,879 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:47,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:40,879 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▎ | 445/1115 [2:44:00<2:43:23, 14.63s/it][WARNING|modeling_bart.py:1051] 2022-03-25 22:22:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▎ | 445/1115 [2:44:00<2:43:23, 14.63s/it][WARNING|modeling_bart.py:1051] 2022-03-25 22:22:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:51,015 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:53,673 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:56,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:22:56,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▍ | 446/1115 [2:44:07<2:17:02, 12.29s/it] Setting `use_cache=False`...1] 2022-03-25 22:22:50,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▍ | 446/1115 [2:44:07<2:17:02, 12.29s/it][WARNING|modeling_bart.py:1051] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:01,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:01,717 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:05,398 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:05,398 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:09,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:09,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:12,635 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:16,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:16,177 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:19,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:19,750 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:23,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▍ | 447/1115 [2:44:36<3:13:05, 17.34s/it] Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▍ | 447/1115 [2:44:36<3:13:05, 17.34s/it] Setting `use_cache=False`...1] 2022-03-25 22:22:57,908 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 40%|██████████████████████████████▍ | 447/1115 [2:44:36<3:13:05, 17.34s/it][WARNING|modeling_bart.py:1051] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:30,373 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:30,373 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:33,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:33,860 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:37,333 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:40,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:40,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 5.2469, 'learning_rate': 0.00026639999999999997, 'epoch': 2.01} [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.9693, 'learning_rate': 0.000267, 'epoch': 2.01} [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5121, 'learning_rate': 0.0002676, 'epoch': 2.02} [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.5022, 'learning_rate': 0.00026819999999999996, 'epoch': 2.02} [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2283, 'learning_rate': 0.0002688, 'epoch': 2.03} [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0816, 'learning_rate': 0.0002694, 'epoch': 2.03} [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.091, 'learning_rate': 0.00027, 'epoch': 2.04} [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0024, 'learning_rate': 0.00027059999999999996, 'epoch': 2.04} [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9794, 'learning_rate': 0.0002712, 'epoch': 2.04} [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.8077, 'learning_rate': 0.0002718, 'epoch': 2.05} [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.86, 'learning_rate': 0.0002724, 'epoch': 2.05} [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6727, 'learning_rate': 0.00027299999999999997, 'epoch': 2.06} [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7923, 'learning_rate': 0.0002736, 'epoch': 2.06} [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:23:44,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7753, 'learning_rate': 0.0002742, 'epoch': 2.07} 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7465, 'learning_rate': 0.0002748, 'epoch': 2.07} 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 41%|███████████████████████████████▍ | 461/1115 [2:50:48<4:40:27, 25.73s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.712, 'learning_rate': 0.00027539999999999997, 'epoch': 2.08} 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5746, 'learning_rate': 0.000276, 'epoch': 2.08} 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6256, 'learning_rate': 0.0002766, 'epoch': 2.09} 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5238, 'learning_rate': 0.0002772, 'epoch': 2.09} 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4411, 'learning_rate': 0.0002778, 'epoch': 2.09} 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5065, 'learning_rate': 0.0002784, 'epoch': 2.1} 42%|███████████████████████████████▌ | 463/1115 [2:51:40<4:39:31, 25.72s/it] Setting `use_cache=False`...1] 2022-03-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5197, 'learning_rate': 0.000279, 'epoch': 2.1} [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4927, 'learning_rate': 0.00027959999999999997, 'epoch': 2.11} [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5846, 'learning_rate': 0.0002802, 'epoch': 2.11} [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:32:38,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4224, 'learning_rate': 0.0002808, 'epoch': 2.12} g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4171, 'learning_rate': 0.00028139999999999996, 'epoch': 2.12} [WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3167, 'learning_rate': 0.00028199999999999997, 'epoch': 2.13} [WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:34:31,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4456, 'learning_rate': 0.0002826, 'epoch': 2.13} [WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:35:06,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:34,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:34,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4501, 'learning_rate': 0.00028319999999999994, 'epoch': 2.13} [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3677, 'learning_rate': 0.00028379999999999996, 'epoch': 2.14} [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:35:38,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:36:26,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:36:26,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2927, 'learning_rate': 0.0002844, 'epoch': 2.14} [WARNING|modeling_utils.py:388] 2022-03-25 22:36:26,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:36:26,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:36:26,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:36:26,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:36:26,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:36:26,054 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:36:42,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:36:42,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:36:42,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:36:42,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3387, 'learning_rate': 0.000285, 'epoch': 2.15} [WARNING|modeling_utils.py:388] 2022-03-25 22:36:42,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:36:42,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:36:42,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:36:42,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:36:59,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:36:59,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:36:59,079 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:04,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:04,742 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 480/1115 [2:58:19<3:52:44, 21.99s/it]g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 480/1115 [2:58:19<3:52:44, 21.99s/it]g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3384, 'learning_rate': 0.00028559999999999995, 'epoch': 2.15} 43%|████████████████████████████████▋ | 480/1115 [2:58:19<3:52:44, 21.99s/it]g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 480/1115 [2:58:19<3:52:44, 21.99s/it]g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|████████████████████████████████▋ | 480/1115 [2:58:19<3:52:44, 21.99s/it]g-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:19,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:19,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:19,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:19,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:19,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2779, 'learning_rate': 0.00028619999999999996, 'epoch': 2.16} [WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:29,515 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:49,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:49,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:49,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2192, 'learning_rate': 0.0002868, 'epoch': 2.16} [WARNING|modeling_utils.py:388] 2022-03-25 22:37:55,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:55,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:37:55,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:02,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:02,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:02,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:08,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:08,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:08,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:08,272 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:14,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:14,517 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:38:18,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:38:18,902 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:22,968 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:22,968 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:22,968 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:28,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:28,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.178, 'learning_rate': 0.00028799999999999995, 'epoch': 2.17} [WARNING|modeling_utils.py:388] 2022-03-25 22:38:28,899 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:34,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:37,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:37,153 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:38:41,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:38:41,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:38:41,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:38:47,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:38:47,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 43%|█████████████████████████████████ | 485/1115 [2:59:59<3:29:28, 19.95s/it] Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:51,010 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:53,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:38:53,245 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:38:57,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:38:59,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:38:59,572 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:03,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:03,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:03,365 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:23:26,869 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 44%|█████████████████████████████████▏ | 486/1115 [3:00:17<3:23:07, 19.38s/it][WARNING|modeling_bart.py:1051] 2022-03-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:39:09,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 22:39:09,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:13,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:15,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:17,280 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:19,355 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:21,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:23,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:23,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:25,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:27,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:29,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:32,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:34,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:36,207 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:38,078 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:39,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:39,919 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:41,923 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:43,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:45,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:47,396 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:49,196 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:52,781 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:54,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:54,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:56,417 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:58,133 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:39:59,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:03,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:04,769 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:06,397 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:08,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:08,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:09,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:12,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:14,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:15,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:18,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:20,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:20,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:23,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:25,837 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:27,176 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:29,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:31,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:31,009 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:32,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:34,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:37,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:39,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:40,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:40,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:43,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:45,137 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:46,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:48,875 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:50,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:50,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:53,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:55,536 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:57,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:57,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:40:59,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:01,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:03,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:03,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.1926, 'learning_rate': 0.00029519999999999997, 'epoch': 2.22} [WARNING|modeling_utils.py:388] 2022-03-25 22:41:07,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:07,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:11,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:11,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:14,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:14,629 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:18,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:21,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:21,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:25,268 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:25,268 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:28,814 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:28,814 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:32,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:32,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:35,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:35,967 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:39,478 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:39,478 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:42,961 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:46,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:46,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:49,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:49,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:53,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:56,853 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:56,853 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:41:56,853 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:00,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:00,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:03,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:07,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:07,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:10,807 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:10,807 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:14,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:14,258 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:17,660 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:21,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:21,155 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:24,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:24,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:24,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:28,023 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:31,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:31,548 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:34,916 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:38,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:38,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:41,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:41,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:45,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:45,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:48,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:51,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 22:42:51,990 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.3539, 'learning_rate': 0.00029759999999999997, 'epoch': 2.24} [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/25/2022 22:52:50 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow {'eval_loss': 4.050220966339111, 'eval_wer': 1.7867314557715193, 'eval_runtime': 594.0081, 'eval_samples_per_second': 4.448, 'eval_steps_per_second': 0.557, 'epoch': 2.24} [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/25/2022 22:54:14 - WARNING - huggingface_hub.repository - Adding files tracked by Git LFS: ['wandb/run-20220325_193848-1sz5964i/run-1sz5964i.wandb']. This may take a bit of time if the files are large. [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.1344, 'learning_rate': 0.0002982, 'epoch': 2.25} [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.2398, 'learning_rate': 0.0002988, 'epoch': 2.25} [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.0204, 'learning_rate': 0.00029939999999999996, 'epoch': 2.26} [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-25 22:42:56,448 >> Batch size = 8ot estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 4.032, 'learning_rate': 0.0003, 'epoch': 2.26} 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.9013, 'learning_rate': 0.0002995121951219512, 'epoch': 2.26} 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7927, 'learning_rate': 0.0002990243902439024, 'epoch': 2.27} 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7671, 'learning_rate': 0.0002985365853658536, 'epoch': 2.27} 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7971, 'learning_rate': 0.00029804878048780484, 'epoch': 2.28} 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 45%|█████████████████████████████████▉ | 504/1115 [3:17:44<16:56:54, 99.86s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.7401, 'learning_rate': 0.00029756097560975606, 'epoch': 2.28} 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5998, 'learning_rate': 0.0002970731707317073, 'epoch': 2.29} 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6632, 'learning_rate': 0.0002965853658536585, 'epoch': 2.29} 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6384, 'learning_rate': 0.0002960975609756097, 'epoch': 2.3} 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6145, 'learning_rate': 0.0002956097560975609, 'epoch': 2.3} 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5137, 'learning_rate': 0.0002951219512195122, 'epoch': 2.3} 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.5113, 'learning_rate': 0.00029463414634146336, 'epoch': 2.31} 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4298, 'learning_rate': 0.0002941463414634146, 'epoch': 2.31} 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3688, 'learning_rate': 0.00029365853658536585, 'epoch': 2.32} 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3792, 'learning_rate': 0.00029317073170731706, 'epoch': 2.32} 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4692, 'learning_rate': 0.0002926829268292683, 'epoch': 2.33} 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3661, 'learning_rate': 0.0002921951219512195, 'epoch': 2.33} 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 46%|██████████████████████████████████▋ | 509/1115 [3:19:57<6:31:50, 38.80s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▌ | 521/1115 [3:24:58<4:03:57, 24.64s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▌ | 521/1115 [3:24:58<4:03:57, 24.64s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4411, 'learning_rate': 0.0002917073170731707, 'epoch': 2.34} [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.509, 'learning_rate': 0.00029121951219512193, 'epoch': 2.34} [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:03:52,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▋ | 523/1115 [3:25:46<3:58:09, 24.14s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 47%|███████████████████████████████████▋ | 523/1115 [3:25:46<3:58:09, 24.14s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3465, 'learning_rate': 0.00029073170731707315, 'epoch': 2.35} [WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2509, 'learning_rate': 0.00029024390243902437, 'epoch': 2.35} [WARNING|modeling_utils.py:388] 2022-03-25 23:04:39,889 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:05:06,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:05:06,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:05:06,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:05:06,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:05:06,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:05:06,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:05:06,120 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:05:20,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:05:20,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:05:20,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.1902, 'learning_rate': 0.0002897560975609756, 'epoch': 2.35} [WARNING|modeling_utils.py:388] 2022-03-25 23:05:20,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:05:20,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:05:20,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:05:20,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:35,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:35,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:35,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:35,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:35,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:35,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:35,389 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2894, 'learning_rate': 0.0002892682926829268, 'epoch': 2.36} [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2102, 'learning_rate': 0.000288780487804878, 'epoch': 2.36} [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.9926, 'learning_rate': 0.00028829268292682923, 'epoch': 2.37} [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:05:49,813 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.1141, 'learning_rate': 0.00028780487804878045, 'epoch': 2.37} [WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:06:41,634 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▏ | 530/1115 [3:28:26<3:40:44, 22.64s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▏ | 530/1115 [3:28:26<3:40:44, 22.64s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.9693, 'learning_rate': 0.00028731707317073167, 'epoch': 2.38} 48%|████████████████████████████████████▏ | 530/1115 [3:28:26<3:40:44, 22.64s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▏ | 530/1115 [3:28:26<3:40:44, 22.64s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▏ | 530/1115 [3:28:26<3:40:44, 22.64s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▏ | 530/1115 [3:28:26<3:40:44, 22.64s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:07:28,707 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:07:28,707 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:07:28,707 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:07:28,707 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:07:28,707 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:07:28,707 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:07:38,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:07:38,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:07:38,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:07:38,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:07:38,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:07:38,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:07:38,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:07:38,467 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:07:54,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:07:54,934 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▎ | 532/1115 [3:29:09<3:34:24, 22.07s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▎ | 532/1115 [3:29:09<3:34:24, 22.07s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.0293, 'learning_rate': 0.0002863414634146341, 'epoch': 2.39} 48%|████████████████████████████████████▎ | 532/1115 [3:29:09<3:34:24, 22.07s/it]g-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:05,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:05,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:05,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:05,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:13,027 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:13,027 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:13,027 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:19,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:19,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.9632, 'learning_rate': 0.00028585365853658537, 'epoch': 2.39} [WARNING|modeling_utils.py:388] 2022-03-25 23:08:19,285 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:25,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:25,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:25,608 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:31,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:31,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:31,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:37,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:37,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.8581, 'learning_rate': 0.00028536585365853654, 'epoch': 2.39} [WARNING|modeling_bart.py:1051] 2022-03-25 23:08:42,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:08:42,370 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:46,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:46,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:46,358 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:52,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:08:52,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:08:56,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▍ | 535/1115 [3:30:08<3:17:39, 20.45s/it] Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▍ | 535/1115 [3:30:08<3:17:39, 20.45s/it] Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:09:00,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:09:00,551 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:09:04,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:09:06,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:09:06,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:09:10,753 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:09:12,966 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:09:12,966 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 22:39:07,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▌ | 536/1115 [3:30:27<3:10:42, 19.76s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 48%|████████████████████████████████████▌ | 536/1115 [3:30:27<3:10:42, 19.76s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:09:19,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:09:19,319 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:09:22,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:09:22,900 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:09:26,785 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:09:28,889 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:09:30,967 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:09:33,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:09:33,035 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:09:35,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:09:37,264 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:09:39,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:09:41,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:09:41,350 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:09:45,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:09:47,554 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:09:49,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:09:51,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:09:51,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:09:53,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:09:55,401 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:09:57,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:09:59,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:00,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:02,786 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:04,594 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:04,594 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:06,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:10,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:11,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:13,462 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:15,141 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:16,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:16,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:20,122 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:21,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:23,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:24,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:27,955 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:29,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:29,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:32,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:33,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:36,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:38,052 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:40,681 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:41,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:41,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:44,586 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:45,794 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:48,152 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:50,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:50,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:52,725 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:54,787 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:56,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:10:57,810 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:00,560 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:00,560 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:02,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:04,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:06,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:06,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:08,729 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:10,403 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:11,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:14,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:14,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:16,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:16,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:20,092 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:20,092 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:23,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:23,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:27,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:27,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:31,046 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:34,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:34,682 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:38,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:38,251 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:41,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:41,833 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:45,444 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:45,444 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:49,010 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:49,010 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:52,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:52,538 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:56,044 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:59,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:11:59,513 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:03,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:03,014 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:06,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:06,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:10,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:10,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:13,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:13,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:17,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:17,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:20,621 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:24,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:24,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:27,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:27,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:31,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:34,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:34,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:38,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:38,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:38,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:41,504 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:44,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:44,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:48,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:48,404 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:51,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:51,792 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:55,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:58,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:12:58,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:01,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.93, 'learning_rate': 0.00027756097560975606, 'epoch': 2.47} [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6004, 'learning_rate': 0.0002770731707317073, 'epoch': 2.47} [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:05,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.4879, 'learning_rate': 0.00027658536585365855, 'epoch': 2.48} [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3519, 'learning_rate': 0.0002760975609756097, 'epoch': 2.48} [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2981, 'learning_rate': 0.00027560975609756093, 'epoch': 2.48} [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.1892, 'learning_rate': 0.0002751219512195122, 'epoch': 2.49} [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.1885, 'learning_rate': 0.00027463414634146336, 'epoch': 2.49} [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.9167, 'learning_rate': 0.00027414634146341463, 'epoch': 2.5} [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.8781, 'learning_rate': 0.00027365853658536585, 'epoch': 2.5} [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.8923, 'learning_rate': 0.000273170731707317, 'epoch': 2.51} [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.6902, 'learning_rate': 0.0002726829268292683, 'epoch': 2.51} [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.6606, 'learning_rate': 0.0002721951219512195, 'epoch': 2.52} [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.5854, 'learning_rate': 0.0002717073170731707, 'epoch': 2.52} [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.4889, 'learning_rate': 0.00027121951219512193, 'epoch': 2.52} [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.3526, 'learning_rate': 0.00027073170731707315, 'epoch': 2.53} [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:13:57,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.3056, 'learning_rate': 0.00027024390243902437, 'epoch': 2.53} 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 565/1115 [3:41:03<3:56:04, 25.75s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.2212, 'learning_rate': 0.0002697560975609756, 'epoch': 2.54} 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▌ | 566/1115 [3:41:28<3:53:07, 25.48s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.1994, 'learning_rate': 0.0002692682926829268, 'epoch': 2.54} 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.1812, 'learning_rate': 0.000268780487804878, 'epoch': 2.55} 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.058, 'learning_rate': 0.00026829268292682924, 'epoch': 2.55} 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|██████████████████████████████████████▋ | 567/1115 [3:41:53<3:50:43, 25.26s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.9894, 'learning_rate': 0.00026780487804878045, 'epoch': 2.56} g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.0365, 'learning_rate': 0.0002673170731707317, 'epoch': 2.56} g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.9143, 'learning_rate': 0.0002668292682926829, 'epoch': 2.57} [WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:22:44,271 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.7815, 'learning_rate': 0.0002663414634146341, 'epoch': 2.57} [WARNING|modeling_bart.py:1051] 2022-03-25 23:23:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:23:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:23:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:23:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:23:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:23:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:23:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:23:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:23:13,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 574/1115 [3:44:42<3:35:16, 23.88s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 574/1115 [3:44:42<3:35:16, 23.88s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.728, 'learning_rate': 0.0002658536585365854, 'epoch': 2.57} 51%|███████████████████████████████████████ | 574/1115 [3:44:42<3:35:16, 23.88s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 574/1115 [3:44:42<3:35:16, 23.88s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 574/1115 [3:44:42<3:35:16, 23.88s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 574/1115 [3:44:42<3:35:16, 23.88s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 574/1115 [3:44:42<3:35:16, 23.88s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 574/1115 [3:44:42<3:35:16, 23.88s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 51%|███████████████████████████████████████ | 574/1115 [3:44:42<3:35:16, 23.88s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:23:49,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:23:49,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:23:49,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:23:49,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.7936, 'learning_rate': 0.00026536585365853654, 'epoch': 2.58} [WARNING|modeling_utils.py:388] 2022-03-25 23:23:49,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:23:49,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:23:49,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:23:49,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.7649, 'learning_rate': 0.0002648780487804878, 'epoch': 2.58} [WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:24:06,396 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:24:26,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:24:26,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:24:26,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:24:26,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:24:26,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:24:26,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:24:26,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 577/1115 [3:45:51<3:29:43, 23.39s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 577/1115 [3:45:51<3:29:43, 23.39s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.6668, 'learning_rate': 0.000264390243902439, 'epoch': 2.59} 52%|███████████████████████████████████████▎ | 577/1115 [3:45:51<3:29:43, 23.39s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▎ | 577/1115 [3:45:51<3:29:43, 23.39s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:24:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:24:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:24:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:24:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:24:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:24:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:24:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:24:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:24:49,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.5596, 'learning_rate': 0.0002639024390243902, 'epoch': 2.59} [WARNING|modeling_utils.py:388] 2022-03-25 23:25:07,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:07,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:11,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:11,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:11,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:11,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:20,186 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:20,186 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:24,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:24,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:24,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:28,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:28,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:28,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:28,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:28,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:28,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:28,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:42,937 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:25:42,937 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▌ | 580/1115 [3:46:57<3:19:19, 22.35s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▌ | 580/1115 [3:46:57<3:19:19, 22.35s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.5524, 'learning_rate': 0.0002629268292682927, 'epoch': 2.6} 52%|███████████████████████████████████████▌ | 580/1115 [3:46:57<3:19:19, 22.35s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▌ | 580/1115 [3:46:57<3:19:19, 22.35s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▌ | 580/1115 [3:46:57<3:19:19, 22.35s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▌ | 580/1115 [3:46:57<3:19:19, 22.35s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▌ | 580/1115 [3:46:57<3:19:19, 22.35s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▌ | 580/1115 [3:46:57<3:19:19, 22.35s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:02,853 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.426, 'learning_rate': 0.0002624390243902439, 'epoch': 2.61} [WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:05,488 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:29,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:29,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.4211, 'learning_rate': 0.0002619512195121951, 'epoch': 2.61} [WARNING|modeling_utils.py:388] 2022-03-25 23:26:29,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:29,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:29,716 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:40,051 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:40,051 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:40,051 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:46,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:46,348 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▋ | 583/1115 [3:48:00<3:09:58, 21.43s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▋ | 583/1115 [3:48:00<3:09:58, 21.43s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.4552, 'learning_rate': 0.0002614634146341463, 'epoch': 2.61} [WARNING|modeling_bart.py:1051] 2022-03-25 23:26:54,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:26:54,705 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:58,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:58,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:58,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:58,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:26:58,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:08,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:08,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:08,643 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.3657, 'learning_rate': 0.00026097560975609754, 'epoch': 2.62} [WARNING|modeling_utils.py:388] 2022-03-25 23:27:14,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:14,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:14,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:20,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:20,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:27:25,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:27:27,422 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 585/1115 [3:48:39<3:00:11, 20.40s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 52%|███████████████████████████████████████▊ | 585/1115 [3:48:39<3:00:11, 20.40s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:31,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:31,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:34,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:37,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:39,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:39,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:27:43,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:27:43,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:47,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:47,295 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:49,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:51,761 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:53,911 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:27:53,911 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:27:57,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:27:59,856 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:28:01,935 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:28:04,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:28:04,024 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:28:06,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:28:08,211 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:28:10,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:28:12,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:28:12,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:16,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:18,282 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:20,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:20,187 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:22,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:24,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:25,936 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:27,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:29,569 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:31,360 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:33,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:33,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:36,764 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:38,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:40,188 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:41,847 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:43,493 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:46,738 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:48,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:48,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:50,008 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:53,141 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:54,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:56,175 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:28:59,136 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:00,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:00,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:02,089 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:04,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:07,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:08,843 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:11,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:11,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:12,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:15,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:16,507 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:18,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:21,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:21,182 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:23,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:25,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:27,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:29,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:31,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:31,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:33,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:35,280 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:37,936 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:37,936 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:39,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:41,428 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:43,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:44,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:44,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:46,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:46,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:50,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:50,577 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:54,263 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:57,905 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:29:57,905 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:01,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:01,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:05,130 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:05,130 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:08,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:12,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:12,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:12,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:15,905 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:15,905 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:19,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:19,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:22,981 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:26,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:26,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:30,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:30,005 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:33,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:33,505 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:36,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:40,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:40,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:40,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:44,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:44,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:47,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:51,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:51,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:54,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:54,552 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:30:58,004 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:01,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:01,427 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:04,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:04,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:08,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:08,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:12,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:12,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:15,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:15,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:18,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:22,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:25,721 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:25,721 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:29,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:29,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:32,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.6587, 'learning_rate': 0.00025317073170731707, 'epoch': 2.69} [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.0331, 'learning_rate': 0.0002526829268292683, 'epoch': 2.7} [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.6576, 'learning_rate': 0.0002521951219512195, 'epoch': 2.7} [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.4141, 'learning_rate': 0.0002517073170731707, 'epoch': 2.7} [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.2096, 'learning_rate': 0.00025121951219512194, 'epoch': 2.71} [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.0265, 'learning_rate': 0.00025073170731707315, 'epoch': 2.71} [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.9003, 'learning_rate': 0.00025024390243902437, 'epoch': 2.72} [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.7156, 'learning_rate': 0.0002497560975609756, 'epoch': 2.72} [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.6201, 'learning_rate': 0.0002492682926829268, 'epoch': 2.73} [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.4503, 'learning_rate': 0.000248780487804878, 'epoch': 2.73} [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.4161, 'learning_rate': 0.00024829268292682924, 'epoch': 2.74} [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.2728, 'learning_rate': 0.00024780487804878045, 'epoch': 2.74} [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.2627, 'learning_rate': 0.00024731707317073167, 'epoch': 2.74} [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.301, 'learning_rate': 0.0002468292682926829, 'epoch': 2.75} [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.1987, 'learning_rate': 0.00024634146341463416, 'epoch': 2.75} [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.0851, 'learning_rate': 0.0002458536585365853, 'epoch': 2.76} [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:31:35,956 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.207, 'learning_rate': 0.00024536585365853654, 'epoch': 2.76} 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|█████████████████████████████████████████▉ | 616/1115 [3:59:48<3:31:16, 25.40s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.0918, 'learning_rate': 0.0002448780487804878, 'epoch': 2.77} 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.1469, 'learning_rate': 0.00024439024390243897, 'epoch': 2.77} 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9882, 'learning_rate': 0.00024390243902439022, 'epoch': 2.78} 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 55%|██████████████████████████████████████████ | 617/1115 [4:00:12<3:29:04, 25.19s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.0057, 'learning_rate': 0.00024341463414634146, 'epoch': 2.78} 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9869, 'learning_rate': 0.00024292682926829268, 'epoch': 2.78} 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▎ | 620/1115 [4:01:26<3:25:06, 24.86s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:40:49,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:40:49,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:40:49,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:40:49,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:40:49,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9118, 'learning_rate': 0.00024243902439024387, 'epoch': 2.79} [WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:40:59,465 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 623/1115 [4:02:38<3:17:53, 24.13s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 56%|██████████████████████████████████████████▍ | 623/1115 [4:02:38<3:17:53, 24.13s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8895, 'learning_rate': 0.0002419512195121951, 'epoch': 2.79} [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.7842, 'learning_rate': 0.00024146341463414633, 'epoch': 2.8} [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8535, 'learning_rate': 0.00024097560975609755, 'epoch': 2.8} [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:41:31,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9101, 'learning_rate': 0.00024048780487804876, 'epoch': 2.81} [WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9084, 'learning_rate': 0.00023999999999999998, 'epoch': 2.81} [WARNING|modeling_bart.py:1051] 2022-03-25 23:42:37,680 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:08,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:08,208 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8409, 'learning_rate': 0.0002395121951219512, 'epoch': 2.82} [WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:12,429 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8037, 'learning_rate': 0.0002390243902439024, 'epoch': 2.82} [WARNING|modeling_bart.py:1051] 2022-03-25 23:43:49,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:43:49,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:43:49,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:43:49,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:43:49,698 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:59,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:59,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:59,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:43:59,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.7994, 'learning_rate': 0.00023853658536585366, 'epoch': 2.83} 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8201, 'learning_rate': 0.00023804878048780485, 'epoch': 2.83} 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|██████████████████████████████████████████▉ | 630/1115 [4:05:18<3:02:09, 22.53s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:44:46,489 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:44:46,489 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████ | 632/1115 [4:06:01<2:56:52, 21.97s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████ | 632/1115 [4:06:01<2:56:52, 21.97s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.7229, 'learning_rate': 0.00023756097560975606, 'epoch': 2.83} 57%|███████████████████████████████████████████ | 632/1115 [4:06:01<2:56:52, 21.97s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████ | 632/1115 [4:06:01<2:56:52, 21.97s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████ | 632/1115 [4:06:01<2:56:52, 21.97s/it]g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:45:00,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:45:00,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:45:00,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:45:07,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:45:07,031 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.7606, 'learning_rate': 0.0002370731707317073, 'epoch': 2.84} g-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:45:17,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:45:17,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:45:17,065 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:45:23,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:45:23,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:45:23,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:45:29,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:45:29,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:45:29,326 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:45:33,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:45:33,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:45:33,847 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:45:39,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:45:39,894 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:45:43,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:45:43,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:45:48,182 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 635/1115 [4:07:00<2:43:43, 20.47s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 635/1115 [4:07:00<2:43:43, 20.47s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:45:52,213 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:45:52,213 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:45:56,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:45:56,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:46:00,330 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:46:02,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:46:02,599 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:46:06,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 636/1115 [4:07:19<2:38:39, 19.87s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▎ | 636/1115 [4:07:19<2:38:39, 19.87s/it] Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:46:10,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:46:12,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:46:12,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:46:16,833 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:46:19,021 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:46:21,141 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:46:23,288 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:46:25,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:46:25,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.7164, 'learning_rate': 0.00023512195121951215, 'epoch': 2.86} [WARNING|modeling_utils.py:388] 2022-03-25 23:46:28,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:46:31,045 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:46:33,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:46:33,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:46:37,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:46:39,500 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:46:41,460 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:09:17,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▍ | 638/1115 [4:07:53<2:27:15, 18.52s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▍ | 638/1115 [4:07:53<2:27:15, 18.52s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:46:45,517 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:46:47,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:46:49,261 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:46:51,127 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:46:52,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:46:54,824 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▌ | 639/1115 [4:08:08<2:18:42, 17.48s/it] Setting `use_cache=False`...1] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▌ | 639/1115 [4:08:08<2:18:42, 17.48s/it] Setting `use_cache=False`...1] 2022-03-25 23:46:43,577 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▌ | 639/1115 [4:08:08<2:18:42, 17.48s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:46:58,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:02,029 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:58,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:03,728 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:58,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:05,412 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:58,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:07,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:58,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:10,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:46:58,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▌ | 640/1115 [4:08:22<2:09:10, 16.32s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:47:12,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▌ | 640/1115 [4:08:22<2:09:10, 16.32s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:47:12,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:13,657 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:12,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:15,236 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:12,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:18,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:12,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:19,892 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:12,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:21,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:12,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:21,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:12,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 57%|███████████████████████████████████████████▋ | 641/1115 [4:08:34<1:59:51, 15.17s/it] Setting `use_cache=False`...1] 2022-03-25 23:47:12,063 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:25,989 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:24,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:27,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:24,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:30,208 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:24,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:31,568 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:24,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:34,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:24,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:34,223 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:24,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▊ | 642/1115 [4:08:45<1:50:07, 13.97s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:47:35,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:38,128 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:35,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:40,559 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:35,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:41,727 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:35,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:44,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:35,616 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:46,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:45,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:46,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:45,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:48,607 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:45,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:50,718 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:45,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:52,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:45,310 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▉ | 644/1115 [4:09:05<1:31:54, 11.71s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:47:54,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|███████████████████████████████████████████▉ | 644/1115 [4:09:05<1:31:54, 11.71s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:47:54,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:56,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:54,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:47:58,395 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:54,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:00,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:47:54,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:02,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:02,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:02,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:02,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:04,580 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:02,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:06,910 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:02,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 646/1115 [4:09:18<1:12:09, 9.23s/it] Setting `use_cache=False`...1] 2022-03-25 23:48:02,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 646/1115 [4:09:18<1:12:09, 9.23s/it] Setting `use_cache=False`...1] 2022-03-25 23:48:02,088 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 646/1115 [4:09:18<1:12:09, 9.23s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 646/1115 [4:09:18<1:12:09, 9.23s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:13,202 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:16,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:16,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:20,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:20,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:24,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:24,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:27,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:31,116 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:31,116 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:34,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 647/1115 [4:09:47<1:58:10, 15.15s/it] Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 647/1115 [4:09:47<1:58:10, 15.15s/it] Setting `use_cache=False`...1] 2022-03-25 23:48:09,490 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 647/1115 [4:09:47<1:58:10, 15.15s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████ | 647/1115 [4:09:47<1:58:10, 15.15s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:41,829 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:45,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:45,290 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:48,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:48,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:52,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:55,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:55,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:59,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:48:59,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:02,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:02,463 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:48:38,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████▏ | 648/1115 [4:10:15<2:27:09, 18.91s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████▏ | 648/1115 [4:10:15<2:27:09, 18.91s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:09,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:09,380 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:12,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:12,822 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:16,228 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:19,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:19,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:23,018 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:26,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:26,414 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:29,774 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... Setting `use_cache=False`...1] 2022-03-25 23:49:05,992 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████▏ | 649/1115 [4:10:42<2:46:22, 21.42s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 58%|████████████████████████████████████████████▏ | 649/1115 [4:10:42<2:46:22, 21.42s/it][WARNING|modeling_bart.py:1051] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:36,608 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:39,974 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:39,974 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:43,360 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:46,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:46,772 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:50,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:50,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:53,483 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.1011, 'learning_rate': 0.00022878048780487802, 'epoch': 2.91} [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.1706, 'learning_rate': 0.00022829268292682924, 'epoch': 2.92} [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.8403, 'learning_rate': 0.00022780487804878048, 'epoch': 2.92} [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:49:56,830 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.7287, 'learning_rate': 0.00022731707317073167, 'epoch': 2.93} 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.5158, 'learning_rate': 0.00022682926829268292, 'epoch': 2.93} 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.2994, 'learning_rate': 0.00022634146341463413, 'epoch': 2.94} 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.099, 'learning_rate': 0.00022585365853658532, 'epoch': 2.94} 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▌ | 653/1115 [4:12:29<3:13:20, 25.11s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:52:43,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:52:43,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:52:43,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:52:43,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:52:43,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:52:43,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:52:43,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:52:43,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9544, 'learning_rate': 0.00022536585365853657, 'epoch': 2.95} 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.97, 'learning_rate': 0.00022487804878048778, 'epoch': 2.95} 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▊ | 657/1115 [4:14:09<3:11:49, 25.13s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9019, 'learning_rate': 0.000224390243902439, 'epoch': 2.96} 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.873, 'learning_rate': 0.00022390243902439022, 'epoch': 2.96} 59%|████████████████████████████████████████████▉ | 659/1115 [4:14:57<3:05:56, 24.47s/it] Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:54:17,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:54:17,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:54:17,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:54:24,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:54:24,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:54:24,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:54:24,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:54:24,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:54:24,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8081, 'learning_rate': 0.00022341463414634146, 'epoch': 2.96} [WARNING|modeling_utils.py:388] 2022-03-25 23:54:24,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:54:24,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:54:24,000 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:54:42,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:54:42,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:54:46,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:54:46,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:54:46,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:54:46,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|█████████████████████████████████████████████ | 662/1115 [4:16:05<2:54:27, 23.11s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|█████████████████████████████████████████████ | 662/1115 [4:16:05<2:54:27, 23.11s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8081, 'learning_rate': 0.00022292682926829265, 'epoch': 2.97} 59%|█████████████████████████████████████████████ | 662/1115 [4:16:05<2:54:27, 23.11s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|█████████████████████████████████████████████ | 662/1115 [4:16:05<2:54:27, 23.11s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|█████████████████████████████████████████████ | 662/1115 [4:16:05<2:54:27, 23.11s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|█████████████████████████████████████████████ | 662/1115 [4:16:05<2:54:27, 23.11s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|█████████████████████████████████████████████ | 662/1115 [4:16:05<2:54:27, 23.11s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:55:08,439 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:55:08,439 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:55:08,439 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:55:08,439 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|█████████████████████████████████████████████▏ | 663/1115 [4:16:26<2:51:13, 22.73s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 59%|█████████████████████████████████████████████▏ | 663/1115 [4:16:26<2:51:13, 22.73s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:55:18,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:55:18,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:55:18,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:55:18,815 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:55:27,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:55:27,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:55:27,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:55:33,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:55:33,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:55:33,133 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:55:37,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:55:37,278 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:55:41,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:55:41,516 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:55:45,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:55:47,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:55:49,785 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:55:51,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:55:51,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:55:51,953 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:55:55,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-25 23:55:55,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:55:59,410 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:01,363 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:03,288 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:05,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:07,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:08,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:08,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:10,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:12,476 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:15,797 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:17,392 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:18,937 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:21,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:21,811 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:23,249 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:25,801 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:28,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:30,361 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:31,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:31,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:34,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:36,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:38,481 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:39,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:39,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:41,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:41,800 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:45,463 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:45,463 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:49,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:49,108 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:52,724 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:56,277 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:56,277 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:59,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:56:59,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:03,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:03,389 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:06,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:06,960 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:10,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:10,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:14,171 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:14,171 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:17,683 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:21,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:21,189 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.2184, 'learning_rate': 0.00021853658536585366, 'epoch': 3.01} [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.9385, 'learning_rate': 0.00021804878048780485, 'epoch': 3.01} [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.2234, 'learning_rate': 0.0002175609756097561, 'epoch': 3.02} [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.2986, 'learning_rate': 0.0002170731707317073, 'epoch': 3.02} [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.1385, 'learning_rate': 0.0002165853658536585, 'epoch': 3.03} [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.0251, 'learning_rate': 0.00021609756097560974, 'epoch': 3.03} [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8948, 'learning_rate': 0.00021560975609756096, 'epoch': 3.04} [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6973, 'learning_rate': 0.00021512195121951218, 'epoch': 3.04} [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-25 23:57:24,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.7551, 'learning_rate': 0.0002146341463414634, 'epoch': 3.04} [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6294, 'learning_rate': 0.00021414634146341464, 'epoch': 3.05} [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6347, 'learning_rate': 0.00021365853658536583, 'epoch': 3.05} [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6635, 'learning_rate': 0.00021317073170731704, 'epoch': 3.06} [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5707, 'learning_rate': 0.0002126829268292683, 'epoch': 3.06} [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5126, 'learning_rate': 0.00021219512195121948, 'epoch': 3.07} [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6055, 'learning_rate': 0.00021170731707317072, 'epoch': 3.07} [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5535, 'learning_rate': 0.00021121951219512194, 'epoch': 3.08} [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5278, 'learning_rate': 0.00021073170731707313, 'epoch': 3.08} [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:01:08,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5102, 'learning_rate': 0.00021024390243902437, 'epoch': 3.09} 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5113, 'learning_rate': 0.0002097560975609756, 'epoch': 3.09} 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4592, 'learning_rate': 0.0002092682926829268, 'epoch': 3.09} 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4637, 'learning_rate': 0.00020878048780487802, 'epoch': 3.1} 62%|██████████████████████████████████████████████▉ | 688/1115 [4:26:16<3:02:06, 25.59s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:27,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:27,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:27,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:27,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:27,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:27,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:27,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:27,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4902, 'learning_rate': 0.00020829268292682927, 'epoch': 3.1} [WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:06:44,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4212, 'learning_rate': 0.00020780487804878046, 'epoch': 3.11} 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▏ | 693/1115 [4:28:19<2:54:00, 24.74s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4117, 'learning_rate': 0.00020731707317073167, 'epoch': 3.11} 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4027, 'learning_rate': 0.00020682926829268292, 'epoch': 3.12} 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 62%|███████████████████████████████████████████████▎ | 694/1115 [4:28:44<2:54:16, 24.84s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4618, 'learning_rate': 0.0002063414634146341, 'epoch': 3.12} g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:08:37,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:08:37,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:08:37,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:08:37,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:08:37,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3627, 'learning_rate': 0.00020585365853658535, 'epoch': 3.13} [WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:08:47,519 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▌ | 698/1115 [4:30:18<2:45:35, 23.83s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▌ | 698/1115 [4:30:18<2:45:35, 23.83s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4113, 'learning_rate': 0.00020536585365853657, 'epoch': 3.13} [WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3982, 'learning_rate': 0.0002048780487804878, 'epoch': 3.13} [WARNING|modeling_utils.py:388] 2022-03-26 00:09:12,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:09:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:09:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:09:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:09:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:09:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:09:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:09:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:09:36,573 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3762, 'learning_rate': 0.000204390243902439, 'epoch': 3.14} [WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:09:53,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▊ | 701/1115 [4:31:27<2:40:51, 23.31s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:10:38,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:10:38,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:10:38,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3957, 'learning_rate': 0.00020341463414634146, 'epoch': 3.15} [WARNING|modeling_utils.py:388] 2022-03-26 00:10:38,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:10:46,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:10:46,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:10:46,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:10:46,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:10:46,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3483, 'learning_rate': 0.00020292682926829265, 'epoch': 3.15} [WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:10:56,818 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:11:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:11:18,687 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 704/1115 [4:32:33<2:32:28, 22.26s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 63%|███████████████████████████████████████████████▉ | 704/1115 [4:32:33<2:32:28, 22.26s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3318, 'learning_rate': 0.0002024390243902439, 'epoch': 3.16} 63%|███████████████████████████████████████████████▉ | 704/1115 [4:32:33<2:32:28, 22.26s/it]g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:11:29,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:11:29,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:11:29,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:11:29,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:11:29,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:11:29,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:11:41,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:11:41,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:11:41,899 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4213, 'learning_rate': 0.00020195121951219511, 'epoch': 3.16} [WARNING|modeling_utils.py:388] 2022-03-26 00:11:47,616 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:11:47,616 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:11:47,616 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:11:47,616 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:11:47,616 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:11:57,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:11:57,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:11:57,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:04,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:04,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.368, 'learning_rate': 0.0002014634146341463, 'epoch': 3.17} [WARNING|modeling_utils.py:388] 2022-03-26 00:12:08,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:08,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:08,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:08,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:16,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:16,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:12:20,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:12:20,792 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:24,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:24,881 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3552, 'learning_rate': 0.00020097560975609755, 'epoch': 3.17} [WARNING|modeling_utils.py:388] 2022-03-26 00:12:28,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:28,645 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:32,264 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:34,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:34,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:34,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:40,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:42,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:42,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:42,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:12:47,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:12:47,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:51,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:51,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:12:55,284 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:12:55,284 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:12:59,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:13:01,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:13:01,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:13:01,309 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:05,403 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:07,599 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:09,779 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:09,779 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:13:13,471 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:13:15,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:13:17,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:13:19,748 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:13:19,748 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:13:19,748 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:23,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:25,701 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:27,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:29,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:31,764 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:33,723 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:35,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:35,711 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:37,769 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:39,664 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:41,560 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:43,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:45,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:47,102 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:48,909 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:48,909 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:50,694 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:54,348 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:56,086 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:13:57,800 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:00,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:02,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:03,638 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:05,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:08,615 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:10,217 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:11,808 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:14,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:16,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:16,375 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:17,904 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:20,925 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:22,346 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:23,748 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:26,533 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:29,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:29,158 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:30,574 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:33,098 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:34,318 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:36,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:37,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:37,898 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:40,293 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:42,502 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:44,634 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:46,699 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:46,699 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:48,795 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:50,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:52,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:55,193 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:55,193 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:57,004 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:14:58,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:00,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:00,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:02,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:02,315 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:05,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:09,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:09,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:12,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:12,933 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:16,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:16,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:20,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:20,132 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:23,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:27,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:27,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:30,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:30,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 3.3472, 'learning_rate': 0.0001946341463414634, 'epoch': 3.23} [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:34,538 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:34,538 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:38,111 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:41,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:41,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:45,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:45,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:48,677 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:52,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:52,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:55,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:55,655 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:59,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:15:59,131 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 2.2989, 'learning_rate': 0.0001941463414634146, 'epoch': 3.23} [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:02,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:06,230 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:06,230 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:09,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:09,700 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:13,164 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:16,650 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:16,650 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:20,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:20,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:23,586 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:27,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:27,099 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.2984, 'learning_rate': 0.00019365853658536583, 'epoch': 3.24} [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:30,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:30,735 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:34,134 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:37,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:37,522 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:40,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:40,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:44,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9822, 'learning_rate': 0.00019317073170731707, 'epoch': 3.24} [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9058, 'learning_rate': 0.0001926829268292683, 'epoch': 3.25} [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.8659, 'learning_rate': 0.00019219512195121948, 'epoch': 3.25} [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:16:47,883 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.7585, 'learning_rate': 0.00019170731707317072, 'epoch': 3.26} 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▍ | 726/1115 [4:39:28<2:49:11, 26.10s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.7159, 'learning_rate': 0.00019121951219512194, 'epoch': 3.26} 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 727/1115 [4:39:55<2:50:14, 26.33s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.643, 'learning_rate': 0.00019073170731707316, 'epoch': 3.26} 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5668, 'learning_rate': 0.00019024390243902437, 'epoch': 3.27} 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5185, 'learning_rate': 0.00018975609756097562, 'epoch': 3.27} 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5032, 'learning_rate': 0.0001892682926829268, 'epoch': 3.28} 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.478, 'learning_rate': 0.00018878048780487803, 'epoch': 3.28} 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4333, 'learning_rate': 0.00018829268292682927, 'epoch': 3.29} 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 65%|█████████████████████████████████████████████████▌ | 728/1115 [4:40:22<2:50:29, 26.43s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4642, 'learning_rate': 0.00018780487804878046, 'epoch': 3.29} 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4117, 'learning_rate': 0.0001873170731707317, 'epoch': 3.3} 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4252, 'learning_rate': 0.00018682926829268292, 'epoch': 3.3} 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████ | 734/1115 [4:42:59<2:45:49, 26.11s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3947, 'learning_rate': 0.0001863414634146341, 'epoch': 3.3} 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4007, 'learning_rate': 0.00018585365853658535, 'epoch': 3.31} 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▏ | 737/1115 [4:44:15<2:40:45, 25.52s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▎ | 739/1115 [4:45:06<2:39:15, 25.41s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3837, 'learning_rate': 0.0001848780487804878, 'epoch': 3.32} 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3415, 'learning_rate': 0.000184390243902439, 'epoch': 3.32} 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4264, 'learning_rate': 0.00018390243902439025, 'epoch': 3.33} 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3209, 'learning_rate': 0.00018341463414634144, 'epoch': 3.33} 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 66%|██████████████████████████████████████████████████▍ | 740/1115 [4:45:31<2:37:31, 25.20s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3521, 'learning_rate': 0.00018292682926829266, 'epoch': 3.34} 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3439, 'learning_rate': 0.0001824390243902439, 'epoch': 3.34} 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 67%|██████████████████████████████████████████████████▋ | 744/1115 [4:47:09<2:33:15, 24.79s/it] Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3283, 'learning_rate': 0.0001819512195121951, 'epoch': 3.35} [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3709, 'learning_rate': 0.00018146341463414633, 'epoch': 3.35} [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:26:36,123 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3629, 'learning_rate': 0.00018097560975609755, 'epoch': 3.35} g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3261, 'learning_rate': 0.00018048780487804877, 'epoch': 3.36} g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3054, 'learning_rate': 0.00017999999999999998, 'epoch': 3.36} g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3341, 'learning_rate': 0.0001795121951219512, 'epoch': 3.37} [WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:28:29,221 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:28:53,556 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:28:53,556 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:28:53,556 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:28:53,556 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:01,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:01,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:01,821 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:06,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:06,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:06,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:06,057 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:14,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:14,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:14,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:14,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:14,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:14,303 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3293, 'learning_rate': 0.00017853658536585363, 'epoch': 3.38} [WARNING|modeling_utils.py:388] 2022-03-26 00:29:30,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:30,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:30,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:30,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:30,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:30,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:30,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:30,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:30,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:48,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:48,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3783, 'learning_rate': 0.00017804878048780485, 'epoch': 3.38} [WARNING|modeling_utils.py:388] 2022-03-26 00:29:48,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:48,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:48,895 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:59,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:29:59,302 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:30:03,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:30:03,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:30:03,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:30:03,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:30:03,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3352, 'learning_rate': 0.0001775609756097561, 'epoch': 3.39} [WARNING|modeling_utils.py:388] 2022-03-26 00:30:03,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:30:15,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:30:15,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:30:15,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:30:15,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.275, 'learning_rate': 0.00017707317073170729, 'epoch': 3.39} [WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:30:23,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:30:44,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:30:44,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:30:44,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-25 23:49:33,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▌ | 757/1115 [4:52:00<2:07:08, 21.31s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 68%|███████████████████████████████████████████████████▌ | 757/1115 [4:52:00<2:07:08, 21.31s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3101, 'learning_rate': 0.00017658536585365853, 'epoch': 3.39} [WARNING|modeling_bart.py:1051] 2022-03-26 00:30:54,631 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:30:57,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:30:57,080 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:31:00,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:31:00,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:31:00,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:31:06,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:31:06,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:31:06,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:31:11,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:31:11,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:31:11,002 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:31:16,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:31:19,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:31:19,267 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:31:23,468 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:31:23,468 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:31:27,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:31:27,297 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2583, 'learning_rate': 0.00017560975609756094, 'epoch': 3.4} [WARNING|modeling_bart.py:1051] 2022-03-26 00:31:31,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:31:31,591 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:31:35,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:31:37,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:31:39,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:31:39,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:31:43,823 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:31:45,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:31:45,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2986, 'learning_rate': 0.00017512195121951218, 'epoch': 3.41} [WARNING|modeling_utils.py:388] 2022-03-26 00:31:49,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:31:51,748 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:31:53,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:31:55,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:31:57,939 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:31:59,982 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:01,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:01,992 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:04,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:06,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:07,951 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:09,832 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:11,736 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:13,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:15,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:17,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:17,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:19,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:21,040 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:24,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:24,633 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:27,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:30,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:32,339 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:34,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:34,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:35,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:37,437 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:39,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:42,214 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:43,748 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:45,264 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:45,264 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:48,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:49,798 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:52,609 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:54,026 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:56,745 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:58,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:32:58,164 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:00,734 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:01,978 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:04,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:06,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:08,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:08,018 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:10,216 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:12,337 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:14,424 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:16,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:16,518 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:18,408 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:21,127 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:22,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:22,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:24,732 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:27,225 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:27,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:30,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:30,150 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:32,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:32,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:36,273 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:36,273 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:39,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:39,931 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:43,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:47,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:47,161 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:50,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:50,805 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:54,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:54,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:33:57,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:01,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:01,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.9187, 'learning_rate': 0.00017024390243902438, 'epoch': 3.45} [WARNING|modeling_utils.py:388] 2022-03-26 00:34:05,105 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:05,105 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:08,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:08,624 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:12,113 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:15,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:15,578 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:19,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:19,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:22,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:26,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:26,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 1.4796, 'learning_rate': 0.00016975609756097557, 'epoch': 3.46} [WARNING|modeling_utils.py:388] 2022-03-26 00:34:29,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:29,802 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:33,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:33,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:36,841 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:40,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:40,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:43,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:43,827 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:47,270 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:47,270 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:50,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:54,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:54,199 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.9913, 'learning_rate': 0.0001692682926829268, 'epoch': 3.46} [WARNING|modeling_utils.py:388] 2022-03-26 00:34:57,769 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:34:57,769 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:01,259 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:04,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:04,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6812, 'learning_rate': 0.00016878048780487803, 'epoch': 3.47} [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6404, 'learning_rate': 0.00016829268292682927, 'epoch': 3.47} [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.6294, 'learning_rate': 0.00016780487804878046, 'epoch': 3.48} [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5524, 'learning_rate': 0.0001673170731707317, 'epoch': 3.48} [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4834, 'learning_rate': 0.00016682926829268292, 'epoch': 3.48} [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4867, 'learning_rate': 0.0001663414634146341, 'epoch': 3.49} [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4394, 'learning_rate': 0.00016585365853658536, 'epoch': 3.49} [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:35:08,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4228, 'learning_rate': 0.00016536585365853657, 'epoch': 3.5} 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3963, 'learning_rate': 0.0001648780487804878, 'epoch': 3.5} 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3693, 'learning_rate': 0.000164390243902439, 'epoch': 3.51} 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3423, 'learning_rate': 0.00016390243902439025, 'epoch': 3.51} 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3681, 'learning_rate': 0.00016341463414634144, 'epoch': 3.52} 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▏ | 780/1115 [4:59:42<2:26:39, 26.27s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:40:26,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:40:26,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:40:26,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:40:26,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:40:26,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:40:26,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:40:26,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:40:26,603 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3625, 'learning_rate': 0.0001624390243902439, 'epoch': 3.52} 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.335, 'learning_rate': 0.0001619512195121951, 'epoch': 3.53} 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3001, 'learning_rate': 0.00016146341463414634, 'epoch': 3.53} 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3523, 'learning_rate': 0.00016097560975609755, 'epoch': 3.54} 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 70%|█████████████████████████████████████████████████████▌ | 785/1115 [5:01:51<2:21:48, 25.78s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3084, 'learning_rate': 0.00016048780487804874, 'epoch': 3.54} [WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:42:35,665 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3502, 'learning_rate': 0.00015999999999999999, 'epoch': 3.55} g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3246, 'learning_rate': 0.0001595121951219512, 'epoch': 3.55} g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3061, 'learning_rate': 0.00015902439024390242, 'epoch': 3.56} g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3581, 'learning_rate': 0.00015853658536585364, 'epoch': 3.56} [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3109, 'learning_rate': 0.00015804878048780488, 'epoch': 3.57} [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:44:08,106 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2814, 'learning_rate': 0.00015756097560975607, 'epoch': 3.57} 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 796/1115 [5:06:21<2:08:05, 24.09s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:45:32,437 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2509, 'learning_rate': 0.0001570731707317073, 'epoch': 3.57} 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3046, 'learning_rate': 0.00015658536585365853, 'epoch': 3.58} 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 71%|██████████████████████████████████████████████████████▎ | 797/1115 [5:06:45<2:06:19, 23.83s/it] Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:46:13,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:46:13,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:46:13,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:46:13,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:46:13,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2995, 'learning_rate': 0.00015609756097560975, 'epoch': 3.58} [WARNING|modeling_utils.py:388] 2022-03-26 00:46:13,100 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:46:25,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:46:25,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:46:25,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:46:25,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2679, 'learning_rate': 0.00015560975609756097, 'epoch': 3.59} [WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:46:33,838 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:46:56,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:46:56,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:47:00,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:47:00,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:47:00,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3044, 'learning_rate': 0.00015512195121951218, 'epoch': 3.59} 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▌ | 801/1115 [5:08:16<2:01:00, 23.12s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.286, 'learning_rate': 0.00015463414634146343, 'epoch': 3.6} [WARNING|modeling_utils.py:388] 2022-03-26 00:47:33,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:47:33,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:47:33,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:47:33,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:47:33,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:47:33,474 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:47:45,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:47:45,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▋ | 803/1115 [5:09:00<1:56:02, 22.32s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▋ | 803/1115 [5:09:00<1:56:02, 22.32s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2883, 'learning_rate': 0.00015414634146341462, 'epoch': 3.6} 72%|██████████████████████████████████████████████████████▋ | 803/1115 [5:09:00<1:56:02, 22.32s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 72%|██████████████████████████████████████████████████████▋ | 803/1115 [5:09:00<1:56:02, 22.32s/it]g-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:47:58,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:47:58,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:47:58,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:03,767 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:03,767 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:07,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:07,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:07,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3195, 'learning_rate': 0.00015365853658536583, 'epoch': 3.61} [WARNING|modeling_utils.py:388] 2022-03-26 00:48:07,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:07,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:07,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:07,690 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:22,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:22,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:22,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:22,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:22,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:22,013 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:32,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:32,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:32,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:32,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:32,391 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:42,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:42,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:48:46,918 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:48:46,918 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:51,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:48:51,077 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2843, 'learning_rate': 0.00015268292682926827, 'epoch': 3.61} [WARNING|modeling_bart.py:1051] 2022-03-26 00:48:54,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:48:54,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:48:54,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:01,348 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:01,348 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:04,956 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:07,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:07,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:07,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:07,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2142, 'learning_rate': 0.0001521951219512195, 'epoch': 3.62} [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:07,383 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:17,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:17,170 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:49:21,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:49:23,414 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:49:23,414 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:27,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:29,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:29,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2466, 'learning_rate': 0.00015170731707317073, 'epoch': 3.62} [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:29,936 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:35,778 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:38,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:38,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:49:41,837 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:49:44,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:49:44,115 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:48,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:49:48,097 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2987, 'learning_rate': 0.00015121951219512192, 'epoch': 3.63} [WARNING|modeling_utils.py:388] 2022-03-26 00:49:51,995 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:49:54,193 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:49:56,341 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:49:58,478 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:50:00,571 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:50:02,660 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:50:04,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:50:04,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 00:50:04,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:08,636 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:10,663 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:12,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:14,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:16,674 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:18,627 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:20,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:20,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:30:50,931 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 811/1115 [5:11:32<1:31:37, 18.08s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:50:22,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:24,486 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:22,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:26,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:22,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:28,284 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:22,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:30,129 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:22,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:31,977 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:22,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:33,811 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:22,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:33,811 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:22,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▎ | 812/1115 [5:11:47<1:26:31, 17.13s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:50:37,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:39,206 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:37,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:40,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:37,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:42,629 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:37,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:45,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:37,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:46,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:37,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:48,531 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:37,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▍ | 813/1115 [5:12:02<1:22:11, 16.33s/it] Setting `use_cache=False`...1] 2022-03-26 00:50:37,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▍ | 813/1115 [5:12:02<1:22:11, 16.33s/it] Setting `use_cache=False`...1] 2022-03-26 00:50:37,449 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:53,513 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:51,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:55,065 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:51,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:56,594 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:51,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:50:58,135 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:51,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:01,173 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:51,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:02,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:51,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:02,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:50:51,911 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:05,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:04,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:06,921 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:04,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:09,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:04,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:10,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:04,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:13,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:04,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:13,526 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:04,167 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|███████████████████████████████████████████████████████▌ | 815/1115 [5:12:25<1:09:10, 13.83s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:51:14,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:17,392 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:14,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:19,782 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:14,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:20,932 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:14,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:23,210 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:14,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:23,210 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:14,919 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:25,515 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:24,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:27,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:24,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:29,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:24,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:31,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:24,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:31,777 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:24,428 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:33,803 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:32,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:35,644 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:32,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:37,403 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:32,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████████▏ | 818/1115 [5:12:50<50:11, 10.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:51:40,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████████▏ | 818/1115 [5:12:50<50:11, 10.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:51:40,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:41,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:40,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:44,230 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:40,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:45,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:40,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:45,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:40,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████████▎ | 819/1115 [5:12:57<45:33, 9.23s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████████▎ | 819/1115 [5:12:57<45:33, 9.23s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 73%|█████████████████████████████████████████████████████████▎ | 819/1115 [5:12:57<45:33, 9.23s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:52,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:55,825 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:55,825 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:59,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:51:59,451 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:03,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:06,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:06,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:10,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:10,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:13,786 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▉ | 820/1115 [5:13:27<1:14:58, 15.25s/it] Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▉ | 820/1115 [5:13:27<1:14:58, 15.25s/it] Setting `use_cache=False`...1] 2022-03-26 00:51:48,367 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▉ | 820/1115 [5:13:27<1:14:58, 15.25s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▉ | 820/1115 [5:13:27<1:14:58, 15.25s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:20,975 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:24,523 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:24,523 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:28,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:28,077 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:31,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:31,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:35,074 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:38,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:38,563 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:42,038 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▉ | 821/1115 [5:13:55<1:33:43, 19.13s/it] Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▉ | 821/1115 [5:13:55<1:33:43, 19.13s/it] Setting `use_cache=False`...1] 2022-03-26 00:52:17,453 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|███████████████████████████████████████████████████████▉ | 821/1115 [5:13:55<1:33:43, 19.13s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:49,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:49,067 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:52,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:52,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:55,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:59,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:52:59,461 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:02,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:02,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:06,388 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:09,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:09,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:09,807 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:52:45,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|████████████████████████████████████████████████████████ | 822/1115 [5:14:22<1:45:58, 21.70s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 74%|████████████████████████████████████████████████████████ | 822/1115 [5:14:22<1:45:58, 21.70s/it][WARNING|modeling_bart.py:1051] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:16,642 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:20,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:20,057 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:23,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:23,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:26,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:30,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:30,251 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.5068, 'learning_rate': 0.000144390243902439, 'epoch': 3.69} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.473, 'learning_rate': 0.00014390243902439023, 'epoch': 3.7} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4847, 'learning_rate': 0.00014341463414634144, 'epoch': 3.7} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4595, 'learning_rate': 0.00014292682926829269, 'epoch': 3.7} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3926, 'learning_rate': 0.00014243902439024388, 'epoch': 3.71} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3979, 'learning_rate': 0.0001419512195121951, 'epoch': 3.71} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3735, 'learning_rate': 0.00014146341463414634, 'epoch': 3.72} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3384, 'learning_rate': 0.00014097560975609755, 'epoch': 3.72} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3536, 'learning_rate': 0.00014048780487804877, 'epoch': 3.73} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3151, 'learning_rate': 0.00014, 'epoch': 3.73} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3147, 'learning_rate': 0.0001395121951219512, 'epoch': 3.74} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3133, 'learning_rate': 0.00013902439024390242, 'epoch': 3.74} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.255, 'learning_rate': 0.00013853658536585364, 'epoch': 3.74} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2868, 'learning_rate': 0.00013804878048780486, 'epoch': 3.75} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2783, 'learning_rate': 0.0001375609756097561, 'epoch': 3.75} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2877, 'learning_rate': 0.00013707317073170732, 'epoch': 3.76} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2775, 'learning_rate': 0.0001365853658536585, 'epoch': 3.76} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2804, 'learning_rate': 0.00013609756097560975, 'epoch': 3.77} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2744, 'learning_rate': 0.00013560975609756097, 'epoch': 3.77} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2385, 'learning_rate': 0.00013512195121951218, 'epoch': 3.78} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2474, 'learning_rate': 0.0001346341463414634, 'epoch': 3.78} [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 00:53:33,708 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2529, 'learning_rate': 0.00013414634146341462, 'epoch': 3.78} [WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:02:27,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:03:00,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:03:00,809 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2812, 'learning_rate': 0.00013365853658536586, 'epoch': 3.79} 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2556, 'learning_rate': 0.00013317073170731705, 'epoch': 3.79} 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▌ | 845/1115 [5:24:15<1:49:39, 24.37s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2562, 'learning_rate': 0.0001321951219512195, 'epoch': 3.8} 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2345, 'learning_rate': 0.00013170731707317073, 'epoch': 3.81} 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2582, 'learning_rate': 0.00013121951219512195, 'epoch': 3.81} 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|█████████████████████████████████████████████████████████▋ | 847/1115 [5:25:02<1:46:42, 23.89s/it] Setting `use_cache=False`...e computed-26 00:53:13,306 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2171, 'learning_rate': 0.00013073170731707316, 'epoch': 3.82} 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 76%|██████████████████████████████████████████████████████████ | 851/1115 [5:26:34<1:41:50, 23.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:05:46,648 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:05:46,648 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:05:50,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:05:50,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:05:50,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:05:50,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:05:50,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:05:50,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:05:50,708 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:06:04,878 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|██████████████████████████████████████████████████████████▏ | 853/1115 [5:27:17<1:37:50, 22.41s/it] Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|██████████████████████████████████████████████████████████▏ | 853/1115 [5:27:17<1:37:50, 22.41s/it] Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.227, 'learning_rate': 0.0001297560975609756, 'epoch': 3.83} 77%|██████████████████████████████████████████████████████████▏ | 853/1115 [5:27:17<1:37:50, 22.41s/it] Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|██████████████████████████████████████████████████████████▏ | 853/1115 [5:27:17<1:37:50, 22.41s/it] Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|██████████████████████████████████████████████████████████▏ | 853/1115 [5:27:17<1:37:50, 22.41s/it] Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:06:17,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:06:17,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:06:17,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:06:23,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:06:23,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:06:23,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:06:23,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2222, 'learning_rate': 0.00012926829268292681, 'epoch': 3.83} [WARNING|modeling_bart.py:1051] 2022-03-26 01:06:23,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:06:23,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:06:23,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:06:23,278 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:06:39,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:06:39,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:06:39,324 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:06:45,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:06:45,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:06:45,700 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:06:49,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:06:49,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:06:49,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:06:55,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:06:55,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:06:55,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:07:02,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:07:02,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:07:02,294 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:07:08,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:07:08,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.235, 'learning_rate': 0.00012829268292682925, 'epoch': 3.84} [WARNING|modeling_bart.py:1051] 2022-03-26 01:07:08,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:07:08,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:07:08,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:07:18,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:07:18,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:07:18,056 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:07:24,171 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:07:24,171 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:07:28,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:07:28,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2213, 'learning_rate': 0.0001278048780487805, 'epoch': 3.84} [WARNING|modeling_bart.py:1051] 2022-03-26 01:07:28,534 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:07:34,719 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:07:37,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:07:37,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:07:37,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:07:42,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:07:42,891 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:07:46,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:07:46,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:07:46,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:07:51,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:07:51,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:07:55,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:07:57,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:07:57,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:01,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:01,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:08:05,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:08:05,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:08:07,448 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:08:09,618 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:08:11,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:08:13,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:08:16,049 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:08:18,173 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:08:20,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:08:20,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:08:20,250 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:05:24,161 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|██████████████████████████████████████████████████████████▌ | 860/1115 [5:29:34<1:20:34, 18.96s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:08:24,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:26,146 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:24,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:28,174 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:24,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:30,139 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:24,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:32,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:24,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:34,032 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:24,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:35,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:24,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:37,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:24,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:37,912 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:24,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|██████████████████████████████████████████████████████████▋ | 861/1115 [5:29:50<1:16:17, 18.02s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:08:39,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:41,774 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:39,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:43,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:39,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:45,464 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:39,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:47,299 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:39,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:50,893 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:39,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:52,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:39,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:52,666 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:39,907 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|██████████████████████████████████████████████████████████▊ | 862/1115 [5:30:04<1:11:47, 17.02s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:08:54,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:56,337 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:54,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:58,042 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:54,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:08:59,759 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:54,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:02,262 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:54,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:03,945 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:54,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:05,600 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:08:54,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|██████████████████████████████████████████████████████████▊ | 863/1115 [5:30:19<1:08:12, 16.24s/it] Setting `use_cache=False`...1] 2022-03-26 01:08:54,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|██████████████████████████████████████████████████████████▊ | 863/1115 [5:30:19<1:08:12, 16.24s/it] Setting `use_cache=False`...1] 2022-03-26 01:08:54,558 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:10,545 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:08,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:12,084 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:08,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:13,617 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:08,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:16,620 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:08,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:18,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:08,958 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|██████████████████████████████████████████████████████████▉ | 864/1115 [5:30:31<1:02:54, 15.04s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:09:21,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 77%|██████████████████████████████████████████████████████████▉ | 864/1115 [5:30:31<1:02:54, 15.04s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:09:21,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:22,474 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:21,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:25,154 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:21,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:26,480 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:21,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:29,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:21,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|████████████████████████████████████████████████████████████▌ | 865/1115 [5:30:42<57:10, 13.72s/it] Setting `use_cache=False`...1] 2022-03-26 01:09:21,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|████████████████████████████████████████████████████████████▌ | 865/1115 [5:30:42<57:10, 13.72s/it] Setting `use_cache=False`...1] 2022-03-26 01:09:21,075 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:32,922 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:31,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:35,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:31,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:36,446 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:31,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:38,704 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:31,691 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|████████████████████████████████████████████████████████████▌ | 866/1115 [5:30:51<51:31, 12.41s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:09:40,982 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|████████████████████████████████████████████████████████████▌ | 866/1115 [5:30:51<51:31, 12.41s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:09:40,982 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:43,091 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:40,982 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:45,103 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:40,982 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:47,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:40,982 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|████████████████████████████████████████████████████████████▋ | 867/1115 [5:30:59<45:56, 11.12s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:09:48,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|████████████████████████████████████████████████████████████▋ | 867/1115 [5:30:59<45:56, 11.12s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:09:48,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:50,804 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:48,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:53,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:48,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:55,217 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:48,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:55,217 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:48,999 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:56,980 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:56,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:09:59,283 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:56,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:01,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:09:56,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|████████████████████████████████████████████████████████████▊ | 869/1115 [5:31:13<36:58, 9.02s/it] Setting `use_cache=False`...1] 2022-03-26 01:09:56,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|████████████████████████████████████████████████████████████▊ | 869/1115 [5:31:13<36:58, 9.02s/it] Setting `use_cache=False`...1] 2022-03-26 01:09:56,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|████████████████████████████████████████████████████████████▊ | 869/1115 [5:31:13<36:58, 9.02s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|████████████████████████████████████████████████████████████▊ | 869/1115 [5:31:13<36:58, 9.02s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:07,715 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:11,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:11,352 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:14,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:14,923 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:18,462 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:21,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:21,988 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:25,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:25,527 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:29,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▎ | 870/1115 [5:31:42<1:00:53, 14.91s/it] Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▎ | 870/1115 [5:31:42<1:00:53, 14.91s/it] Setting `use_cache=False`...1] 2022-03-26 01:10:04,055 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▎ | 870/1115 [5:31:42<1:00:53, 14.91s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:36,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:36,040 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:39,472 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:39,472 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:42,861 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:46,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:46,241 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:49,626 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:49,626 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:52,982 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:10:56,403 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▎ | 871/1115 [5:32:09<1:15:49, 18.64s/it] Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▎ | 871/1115 [5:32:09<1:15:49, 18.64s/it] Setting `use_cache=False`...1] 2022-03-26 01:10:32,573 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▎ | 871/1115 [5:32:09<1:15:49, 18.64s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:03,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:03,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:06,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:06,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:09,969 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:13,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:13,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:16,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:16,679 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:19,976 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:23,282 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▍ | 872/1115 [5:32:36<1:25:28, 21.11s/it] Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▍ | 872/1115 [5:32:36<1:25:28, 21.11s/it] Setting `use_cache=False`...1] 2022-03-26 01:10:59,915 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▍ | 872/1115 [5:32:36<1:25:28, 21.11s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:30,081 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:30,081 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:33,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:33,444 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:36,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:40,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:40,126 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:43,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:43,417 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:46,740 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:50,042 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▌ | 873/1115 [5:33:03<1:31:55, 22.79s/it] Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▌ | 873/1115 [5:33:03<1:31:55, 22.79s/it] Setting `use_cache=False`...1] 2022-03-26 01:11:26,760 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 78%|███████████████████████████████████████████████████████████▌ | 873/1115 [5:33:03<1:31:55, 22.79s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:56,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:11:56,736 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3146, 'learning_rate': 0.0001195121951219512, 'epoch': 3.92} [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3146, 'learning_rate': 0.00011902439024390242, 'epoch': 3.92} [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3472, 'learning_rate': 0.00011853658536585365, 'epoch': 3.93} [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2634, 'learning_rate': 0.00011804878048780487, 'epoch': 3.93} [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3482, 'learning_rate': 0.00011756097560975607, 'epoch': 3.94} [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2643, 'learning_rate': 0.0001170731707317073, 'epoch': 3.94} [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:12:00,036 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|███████████████████████████████████████████████████████████▉ | 880/1115 [5:36:01<1:36:56, 24.75s/it] Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 79%|███████████████████████████████████████████████████████████▉ | 880/1115 [5:36:01<1:36:56, 24.75s/it] Setting `use_cache=False`...1] 2022-03-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2699, 'learning_rate': 0.00011658536585365852, 'epoch': 3.95} [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2764, 'learning_rate': 0.00011609756097560974, 'epoch': 3.95} [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2324, 'learning_rate': 0.00011560975609756097, 'epoch': 3.96} [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2683, 'learning_rate': 0.00011512195121951219, 'epoch': 3.96} [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:14:54,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:16,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:16,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:16,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:16,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:16,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:16,876 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:26,770 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:26,770 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:26,770 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:26,770 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2287, 'learning_rate': 0.00011414634146341462, 'epoch': 3.97} [WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:34,824 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:57,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:57,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:57,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:57,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:57,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:16:57,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:07,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:07,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:07,290 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:13,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:13,300 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:17:17,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:17:17,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:17:17,686 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:17:23,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:17:23,466 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:11:53,454 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|████████████████████████████████████████████████████████████▍ | 887/1115 [5:38:35<1:20:55, 21.29s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 80%|████████████████████████████████████████████████████████████▍ | 887/1115 [5:38:35<1:20:55, 21.29s/it][WARNING|modeling_bart.py:1051] 2022-03-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:29,582 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:31,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:31,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:31,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:37,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:39,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:41,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:41,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:43,871 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:45,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:47,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:49,487 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:51,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:52,983 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:54,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:54,654 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:57,997 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:17:59,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:01,094 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:03,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:05,344 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:07,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:07,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:09,353 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:11,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:14,034 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:16,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:16,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:18,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:19,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:21,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:21,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:23,920 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:26,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:26,418 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:30,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:30,021 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:33,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:33,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:37,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:37,342 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:41,015 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:44,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:44,646 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:48,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:48,283 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:51,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:51,835 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4579, 'learning_rate': 0.00011024390243902438, 'epoch': 4.0} [WARNING|modeling_utils.py:388] 2022-03-26 01:18:55,506 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:59,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:18:59,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:02,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:02,585 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:06,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:09,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:09,593 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:13,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:13,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.4033, 'learning_rate': 0.0001097560975609756, 'epoch': 4.01} [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.313, 'learning_rate': 0.00010926829268292683, 'epoch': 4.01} [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2394, 'learning_rate': 0.00010878048780487805, 'epoch': 4.02} [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2394, 'learning_rate': 0.00010829268292682925, 'epoch': 4.02} [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2715, 'learning_rate': 0.00010780487804878048, 'epoch': 4.03} [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:19:17,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.24, 'learning_rate': 0.0001073170731707317, 'epoch': 4.03} g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2086, 'learning_rate': 0.00010682926829268291, 'epoch': 4.04} g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2325, 'learning_rate': 0.00010634146341463414, 'epoch': 4.04} g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1837, 'learning_rate': 0.00010585365853658536, 'epoch': 4.04} g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1641, 'learning_rate': 0.00010536585365853656, 'epoch': 4.05} g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.169, 'learning_rate': 0.0001048780487804878, 'epoch': 4.05} 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1444, 'learning_rate': 0.00010439024390243901, 'epoch': 4.06} 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1955, 'learning_rate': 0.00010390243902439023, 'epoch': 4.06} 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1712, 'learning_rate': 0.00010341463414634146, 'epoch': 4.07} 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1723, 'learning_rate': 0.00010292682926829268, 'epoch': 4.07} 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1577, 'learning_rate': 0.0001024390243902439, 'epoch': 4.08} 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1411, 'learning_rate': 0.00010195121951219511, 'epoch': 4.08} 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.147, 'learning_rate': 0.00010146341463414633, 'epoch': 4.09} 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1688, 'learning_rate': 0.00010097560975609756, 'epoch': 4.09} 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1425, 'learning_rate': 0.00010048780487804877, 'epoch': 4.09} 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.184, 'learning_rate': 9.999999999999999e-05, 'epoch': 4.1} 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1604, 'learning_rate': 9.951219512195122e-05, 'epoch': 4.1} 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1326, 'learning_rate': 9.902439024390243e-05, 'epoch': 4.11} 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1505, 'learning_rate': 9.853658536585364e-05, 'epoch': 4.11} 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 81%|█████████████████████████████████████████████████████████████▌ | 904/1115 [5:45:03<1:32:39, 26.35s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1165, 'learning_rate': 9.804878048780487e-05, 'epoch': 4.12} [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1217, 'learning_rate': 9.756097560975609e-05, 'epoch': 4.12} [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:29:28,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:30:24,160 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:30:24,160 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:30:24,160 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:30:24,160 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▋ | 920/1115 [5:51:42<1:18:17, 24.09s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▋ | 920/1115 [5:51:42<1:18:17, 24.09s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1376, 'learning_rate': 9.70731707317073e-05, 'epoch': 4.13} 83%|██████████████████████████████████████████████████████████████▋ | 920/1115 [5:51:42<1:18:17, 24.09s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▋ | 920/1115 [5:51:42<1:18:17, 24.09s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▋ | 920/1115 [5:51:42<1:18:17, 24.09s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:30:42,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:30:42,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:30:42,619 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:30:48,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:30:48,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:30:48,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1416, 'learning_rate': 9.658536585365854e-05, 'epoch': 4.13} 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▊ | 921/1115 [5:52:05<1:16:58, 23.81s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1483, 'learning_rate': 9.609756097560974e-05, 'epoch': 4.13} [WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:31:16,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1441, 'learning_rate': 9.560975609756097e-05, 'epoch': 4.14} 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▉ | 923/1115 [5:52:50<1:14:21, 23.24s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▉ | 924/1115 [5:53:13<1:13:13, 23.00s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▉ | 924/1115 [5:53:13<1:13:13, 23.00s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1459, 'learning_rate': 9.512195121951219e-05, 'epoch': 4.14} 83%|██████████████████████████████████████████████████████████████▉ | 924/1115 [5:53:13<1:13:13, 23.00s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▉ | 924/1115 [5:53:13<1:13:13, 23.00s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▉ | 924/1115 [5:53:13<1:13:13, 23.00s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|██████████████████████████████████████████████████████████████▉ | 924/1115 [5:53:13<1:13:13, 23.00s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:15,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:15,025 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:19,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:19,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:23,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:23,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:23,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:23,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.138, 'learning_rate': 9.46341463414634e-05, 'epoch': 4.15} [WARNING|modeling_utils.py:388] 2022-03-26 01:32:23,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:23,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:23,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:23,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:23,226 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1196, 'learning_rate': 9.414634146341463e-05, 'epoch': 4.15} [WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:32:41,631 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:33:06,339 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|███████████████████████████████████████████████████████████████▏ | 927/1115 [5:54:18<1:09:19, 22.12s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|███████████████████████████████████████████████████████████████▏ | 927/1115 [5:54:18<1:09:19, 22.12s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1039, 'learning_rate': 9.365853658536585e-05, 'epoch': 4.16} 83%|███████████████████████████████████████████████████████████████▏ | 927/1115 [5:54:18<1:09:19, 22.12s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|███████████████████████████████████████████████████████████████▏ | 927/1115 [5:54:18<1:09:19, 22.12s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|███████████████████████████████████████████████████████████████▏ | 927/1115 [5:54:18<1:09:19, 22.12s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:33:18,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:33:18,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:33:18,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:33:18,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:33:18,636 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:33:28,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:33:28,882 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1461, 'learning_rate': 9.317073170731706e-05, 'epoch': 4.16} [WARNING|modeling_utils.py:388] 2022-03-26 01:33:32,820 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:33:32,820 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:33:32,820 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:33:39,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:33:39,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:33:39,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:33:39,169 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:33:47,499 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|███████████████████████████████████████████████████████████████▎ | 929/1115 [5:55:00<1:06:01, 21.30s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|███████████████████████████████████████████████████████████████▎ | 929/1115 [5:55:00<1:06:01, 21.30s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.156, 'learning_rate': 9.268292682926829e-05, 'epoch': 4.17} 83%|███████████████████████████████████████████████████████████████▎ | 929/1115 [5:55:00<1:06:01, 21.30s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:33:55,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:33:55,570 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:33:59,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:33:59,997 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:34:04,041 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:34:04,041 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:34:04,041 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:34:10,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:34:10,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1005, 'learning_rate': 9.21951219512195e-05, 'epoch': 4.17} [WARNING|modeling_utils.py:388] 2022-03-26 01:34:10,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:34:10,138 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:34:17,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:34:17,938 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:34:21,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:34:21,896 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:34:26,106 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|███████████████████████████████████████████████████████████████▍ | 931/1115 [5:55:38<1:01:58, 20.21s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 83%|███████████████████████████████████████████████████████████████▍ | 931/1115 [5:55:38<1:01:58, 20.21s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:34:30,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:34:30,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:34:30,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:34:30,074 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:34:38,518 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:34:40,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:34:40,784 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:34:44,556 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:34:46,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:34:46,772 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1158, 'learning_rate': 9.121951219512195e-05, 'epoch': 4.18} [WARNING|modeling_bart.py:1051] 2022-03-26 01:34:50,877 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:34:53,059 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:34:55,151 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:34:57,213 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:34:59,269 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:01,328 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:03,404 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:05,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:05,588 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:07,604 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:09,596 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:11,570 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:13,542 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:15,471 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:17,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:17,369 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:19,253 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:21,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:23,123 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:24,965 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:26,785 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:28,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:32,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:32,130 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:33,871 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:35,720 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:37,429 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:39,147 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:42,529 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:44,176 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:45,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:45,816 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:47,424 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:50,690 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:52,244 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:53,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:56,747 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:58,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:58,207 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:35:59,630 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:02,528 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:03,901 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:05,268 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:08,781 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:10,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:10,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:12,789 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:14,050 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:16,546 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:18,926 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:21,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:21,179 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:22,384 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:24,505 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:26,603 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:28,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:28,585 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:30,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:32,406 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:34,192 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:36,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:36,851 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:38,620 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:40,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:43,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:43,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1689, 'learning_rate': 8.634146341463413e-05, 'epoch': 4.22} [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:46,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:46,569 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:50,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:50,259 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:53,937 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:57,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:36:57,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:01,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:01,357 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:04,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:04,994 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:08,609 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:08,609 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:08,609 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:12,172 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:15,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:15,834 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:19,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:19,418 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:22,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:22,990 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:26,521 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:30,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:30,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:33,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:33,541 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:38,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:38,031 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:41,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:41,494 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3293, 'learning_rate': 8.536585365853658e-05, 'epoch': 4.23} [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:45,185 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:48,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:48,590 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:52,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:52,114 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:55,583 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:59,101 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:37:59,101 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:02,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:02,565 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:06,053 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:09,517 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:09,517 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:09,517 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:13,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:13,066 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:16,436 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:19,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:19,828 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:23,242 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:23,242 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:26,633 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:30,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:30,025 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2403, 'learning_rate': 8.439024390243901e-05, 'epoch': 4.24} [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:38:33,455 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2498, 'learning_rate': 8.390243902439023e-05, 'epoch': 4.25} 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1854, 'learning_rate': 8.341463414634146e-05, 'epoch': 4.25} 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1997, 'learning_rate': 8.292682926829268e-05, 'epoch': 4.26} 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2298, 'learning_rate': 8.24390243902439e-05, 'epoch': 4.26} 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 85%|████████████████████████████████████████████████████████████████▌ | 947/1115 [6:00:15<1:09:04, 24.67s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1632, 'learning_rate': 8.195121951219513e-05, 'epoch': 4.26} [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1737, 'learning_rate': 8.146341463414633e-05, 'epoch': 4.27} [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1601, 'learning_rate': 8.097560975609755e-05, 'epoch': 4.27} [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1663, 'learning_rate': 8.048780487804878e-05, 'epoch': 4.28} [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.187, 'learning_rate': 7.999999999999999e-05, 'epoch': 4.28} [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1445, 'learning_rate': 7.951219512195121e-05, 'epoch': 4.29} [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1499, 'learning_rate': 7.902439024390244e-05, 'epoch': 4.29} [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1406, 'learning_rate': 7.853658536585364e-05, 'epoch': 4.3} [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1639, 'learning_rate': 7.804878048780487e-05, 'epoch': 4.3} [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:40:45,318 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1295, 'learning_rate': 7.756097560975609e-05, 'epoch': 4.3} 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▍ | 960/1115 [6:05:57<1:06:03, 25.57s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1336, 'learning_rate': 7.707317073170731e-05, 'epoch': 4.31} 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 961/1115 [6:06:21<1:05:07, 25.38s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1371, 'learning_rate': 7.609756097560976e-05, 'epoch': 4.32} 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▌ | 962/1115 [6:06:46<1:04:10, 25.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1307, 'learning_rate': 7.560975609756096e-05, 'epoch': 4.32} 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1316, 'learning_rate': 7.512195121951219e-05, 'epoch': 4.33} 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1358, 'learning_rate': 7.46341463414634e-05, 'epoch': 4.33} 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1215, 'learning_rate': 7.414634146341462e-05, 'epoch': 4.34} 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 86%|█████████████████████████████████████████████████████████████████▋ | 964/1115 [6:07:36<1:03:04, 25.06s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1351, 'learning_rate': 7.365853658536584e-05, 'epoch': 4.34} 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▋ | 968/1115 [6:09:12<59:14, 24.18s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▊ | 969/1115 [6:09:37<59:10, 24.32s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▊ | 969/1115 [6:09:37<59:10, 24.32s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.121, 'learning_rate': 7.317073170731707e-05, 'epoch': 4.35} 87%|███████████████████████████████████████████████████████████████████▊ | 969/1115 [6:09:37<59:10, 24.32s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▊ | 969/1115 [6:09:37<59:10, 24.32s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▊ | 969/1115 [6:09:37<59:10, 24.32s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▊ | 969/1115 [6:09:37<59:10, 24.32s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▊ | 969/1115 [6:09:37<59:10, 24.32s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:41,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:41,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:41,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:41,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:41,227 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.106, 'learning_rate': 7.21951219512195e-05, 'epoch': 4.35} [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:48:49,873 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1224, 'learning_rate': 7.170731707317072e-05, 'epoch': 4.36} 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1107, 'learning_rate': 7.121951219512194e-05, 'epoch': 4.36} 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1096, 'learning_rate': 7.073170731707317e-05, 'epoch': 4.37} 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1294, 'learning_rate': 7.024390243902439e-05, 'epoch': 4.37} 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 87%|███████████████████████████████████████████████████████████████████▉ | 972/1115 [6:10:47<56:35, 23.75s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1206, 'learning_rate': 6.97560975609756e-05, 'epoch': 4.38} [WARNING|modeling_utils.py:388] 2022-03-26 01:51:15,648 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:15,648 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:19,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:19,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:19,922 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:25,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:25,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:25,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:25,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:25,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:25,612 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▎ | 977/1115 [6:12:46<53:59, 23.47s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▎ | 977/1115 [6:12:46<53:59, 23.47s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▎ | 977/1115 [6:12:46<53:59, 23.47s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▎ | 977/1115 [6:12:46<53:59, 23.47s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▎ | 977/1115 [6:12:46<53:59, 23.47s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▎ | 977/1115 [6:12:46<53:59, 23.47s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▎ | 977/1115 [6:12:46<53:59, 23.47s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▎ | 977/1115 [6:12:46<53:59, 23.47s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1112, 'learning_rate': 6.878048780487805e-05, 'epoch': 4.39} [WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:51:51,996 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1085, 'learning_rate': 6.829268292682925e-05, 'epoch': 4.39} [WARNING|modeling_utils.py:388] 2022-03-26 01:52:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:16,918 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:32,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:32,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:32,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:32,780 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 980/1115 [6:13:51<49:59, 22.22s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▌ | 980/1115 [6:13:51<49:59, 22.22s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:43,236 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:43,236 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:43,236 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:49,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:49,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:49,587 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:55,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:55,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:55,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:52:55,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:53:01,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:53:01,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:53:01,762 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:53:08,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:53:08,110 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:53:11,987 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:53:14,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:53:14,261 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:53:18,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▋ | 982/1115 [6:14:30<46:16, 20.88s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 88%|████████████████████████████████████████████████████████████████████▋ | 982/1115 [6:14:30<46:16, 20.88s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:53:22,224 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:53:24,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:53:26,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:53:26,606 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:53:30,544 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:53:32,661 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:53:34,755 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:53:36,863 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:53:39,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:53:39,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:53:41,162 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:53:43,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 01:53:43,222 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:53:46,672 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:53:48,664 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:53:50,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:53:52,580 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:53:54,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:53:54,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:53:56,550 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:53:58,449 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:00,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:02,163 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:03,975 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:05,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:07,579 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:07,579 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:11,210 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:12,940 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:14,651 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:16,343 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:18,037 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:19,677 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:23,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:23,029 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:24,627 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:26,196 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:29,304 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:30,803 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:32,284 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:35,273 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:35,273 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:36,699 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:39,413 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:41,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:42,776 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:45,269 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:46,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:46,662 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:49,053 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:51,385 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:53,669 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:54,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:54,784 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:57,048 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:54:59,139 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:01,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:03,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:03,114 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:05,087 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:07,778 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:09,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:09,522 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:11,314 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:13,754 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:15,270 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:15,270 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:16,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:19,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:19,121 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:22,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:22,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:26,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:26,492 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:30,159 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:33,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:33,751 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:37,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:37,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:40,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:40,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:40,874 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:44,434 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:48,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:48,129 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:51,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:51,666 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:55,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:55,195 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:55:58,769 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:02,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:02,333 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:05,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:05,864 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:10,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:10,346 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:13,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:13,759 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.3143, 'learning_rate': 6.0975609756097554e-05, 'epoch': 4.46} [WARNING|modeling_utils.py:388] 2022-03-26 01:56:17,421 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:20,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:20,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:24,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:24,384 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:27,885 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:31,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:31,362 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:34,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:34,826 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:38,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:38,329 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:41,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:41,744 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:45,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:45,275 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:48,725 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:48,725 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:52,132 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:55,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:55,564 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:58,976 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:56:58,976 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:02,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1964, 'learning_rate': 5.9999999999999995e-05, 'epoch': 4.47} [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2446, 'learning_rate': 5.951219512195121e-05, 'epoch': 4.47} [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1735, 'learning_rate': 5.9024390243902435e-05, 'epoch': 4.48} [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 01:57:05,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/26/2022 02:08:27 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow {'eval_loss': 0.36502909660339355, 'eval_wer': 0.11207854026180088, 'eval_runtime': 567.0865, 'eval_samples_per_second': 4.659, 'eval_steps_per_second': 0.584, 'epoch': 4.48} [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1683, 'learning_rate': 5.756097560975609e-05, 'epoch': 4.49} [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1527, 'learning_rate': 5.707317073170731e-05, 'epoch': 4.49} [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1248, 'learning_rate': 5.6585365853658533e-05, 'epoch': 4.5} [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1299, 'learning_rate': 5.609756097560975e-05, 'epoch': 4.5} [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1563, 'learning_rate': 5.560975609756097e-05, 'epoch': 4.51} [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2366] 2022-03-26 01:59:00,193 >> Num examples = 2642 g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1239, 'learning_rate': 5.512195121951219e-05, 'epoch': 4.51} 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1481, 'learning_rate': 5.4634146341463415e-05, 'epoch': 4.52} 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1316, 'learning_rate': 5.4146341463414625e-05, 'epoch': 4.52} 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▋ | 1006/1115 [6:34:02<1:49:44, 60.41s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1313, 'learning_rate': 5.365853658536585e-05, 'epoch': 4.52} 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1185, 'learning_rate': 5.317073170731707e-05, 'epoch': 4.53} 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1288, 'learning_rate': 5.268292682926828e-05, 'epoch': 4.53} 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1273, 'learning_rate': 5.2195121951219506e-05, 'epoch': 4.54} 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1418, 'learning_rate': 5.170731707317073e-05, 'epoch': 4.54} 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 90%|███████████████████████████████████████████████████████████████████▊ | 1009/1115 [6:35:20<1:06:30, 37.64s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1133, 'learning_rate': 5.121951219512195e-05, 'epoch': 4.55} 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1178, 'learning_rate': 5.0731707317073163e-05, 'epoch': 4.55} 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.097, 'learning_rate': 5.024390243902439e-05, 'epoch': 4.56} 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1185, 'learning_rate': 4.975609756097561e-05, 'epoch': 4.56} 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████ | 1014/1115 [6:37:26<45:43, 27.16s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████▎ | 1018/1115 [6:39:02<40:03, 24.77s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 91%|██████████████████████████████████████████████████████████████████████▎ | 1018/1115 [6:39:02<40:03, 24.77s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1152, 'learning_rate': 4.8780487804878045e-05, 'epoch': 4.57} [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1227, 'learning_rate': 4.829268292682927e-05, 'epoch': 4.57} [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1017, 'learning_rate': 4.7804878048780485e-05, 'epoch': 4.58} [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0982, 'learning_rate': 4.73170731707317e-05, 'epoch': 4.58} [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:17:56,243 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1289, 'learning_rate': 4.6829268292682926e-05, 'epoch': 4.59} 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▋ | 1023/1115 [6:40:59<35:51, 23.39s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▋ | 1024/1115 [6:41:22<35:10, 23.19s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▋ | 1024/1115 [6:41:22<35:10, 23.19s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0997, 'learning_rate': 4.634146341463414e-05, 'epoch': 4.59} 92%|██████████████████████████████████████████████████████████████████████▋ | 1024/1115 [6:41:22<35:10, 23.19s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▋ | 1024/1115 [6:41:22<35:10, 23.19s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:20:19,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:20:19,644 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0879, 'learning_rate': 4.585365853658536e-05, 'epoch': 4.6} [WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:20:23,831 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:20:54,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:20:54,484 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▊ | 1026/1115 [6:42:07<33:49, 22.81s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1062, 'learning_rate': 4.48780487804878e-05, 'epoch': 4.61} [WARNING|modeling_utils.py:388] 2022-03-26 02:21:20,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:21:20,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:21:20,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:21:20,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:21:20,529 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:21:30,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:21:30,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:21:30,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:21:30,948 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▉ | 1028/1115 [6:42:49<31:54, 22.00s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|██████████████████████████████████████████████████████████████████████▉ | 1028/1115 [6:42:49<31:54, 22.00s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0964, 'learning_rate': 4.4390243902439024e-05, 'epoch': 4.61} 92%|██████████████████████████████████████████████████████████████████████▉ | 1028/1115 [6:42:49<31:54, 22.00s/it]g-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:21:45,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:21:45,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:21:45,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:21:45,369 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:21:53,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:21:53,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:21:53,024 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:21:59,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:21:59,416 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0981, 'learning_rate': 4.3902439024390234e-05, 'epoch': 4.61} [WARNING|modeling_utils.py:388] 2022-03-26 02:22:03,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:22:03,311 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:22:07,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:22:07,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:22:07,846 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:22:14,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:22:14,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:22:14,041 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|███████████████████████████████████████████████████████████████████████▏ | 1030/1115 [6:43:30<29:52, 21.09s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 92%|███████████████████████████████████████████████████████████████████████▏ | 1030/1115 [6:43:30<29:52, 21.09s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.068, 'learning_rate': 4.341463414634146e-05, 'epoch': 4.62} [WARNING|modeling_bart.py:1051] 2022-03-26 02:22:23,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:22:23,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:22:27,974 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:22:27,974 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:22:27,974 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:22:33,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:22:33,892 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:22:38,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:22:38,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1051, 'learning_rate': 4.292682926829268e-05, 'epoch': 4.62} [WARNING|modeling_utils.py:388] 2022-03-26 02:22:42,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:22:42,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:22:42,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:22:42,254 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:22:50,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:22:52,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:22:52,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:22:56,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▎ | 1032/1115 [6:44:09<28:01, 20.26s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▎ | 1032/1115 [6:44:09<28:01, 20.26s/it] Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:23:00,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:23:00,731 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:04,856 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:07,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:07,115 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:23:10,790 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:23:12,927 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:23:15,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:23:15,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:23:15,111 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:19,072 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:21,144 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:23,237 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:25,297 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:27,343 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:29,359 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:31,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:31,363 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 01:17:25,864 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▍ | 1034/1115 [6:44:43<25:11, 18.66s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:23:33,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:35,419 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:33,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:37,362 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:33,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:39,285 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:33,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:41,198 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:33,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:43,089 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:33,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:44,984 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:33,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:46,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:33,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:46,866 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:33,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▍ | 1035/1115 [6:44:59<23:35, 17.70s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:23:48,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:50,710 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:48,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:52,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:48,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:56,112 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:48,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:57,861 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:48,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:23:59,548 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:48,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:01,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:48,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:01,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:23:48,862 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▌ | 1036/1115 [6:45:13<21:55, 16.65s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:24:03,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:06,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:03,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:07,872 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:03,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:09,470 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:03,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:11,034 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:03,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:12,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:03,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:12,581 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:03,003 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▌ | 1037/1115 [6:45:26<20:08, 15.50s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:24:15,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:17,276 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:15,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:20,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:15,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:20,183 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:15,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:23,805 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:15,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:25,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:15,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:25,157 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:15,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▋ | 1038/1115 [6:45:38<18:37, 14.52s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:24:27,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:29,250 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:27,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:31,780 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:27,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:34,193 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:27,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:36,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:27,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:36,539 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:27,964 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1039/1115 [6:45:48<16:38, 13.14s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:24:37,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:40,048 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:37,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:42,217 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:37,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:44,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:37,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:44,281 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:37,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▊ | 1040/1115 [6:45:56<14:44, 11.79s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:24:46,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:48,267 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:46,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:50,117 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:46,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:51,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:46,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:51,929 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:46,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:54,646 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:53,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:56,347 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:53,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:58,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:53,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:24:58,651 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:24:53,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▉ | 1042/1115 [6:46:10<11:18, 9.29s/it] Setting `use_cache=False`...1] 2022-03-26 02:24:53,794 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▉ | 1042/1115 [6:46:10<11:18, 9.29s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 93%|███████████████████████████████████████████████████████████████████████▉ | 1042/1115 [6:46:10<11:18, 9.29s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:05,044 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:08,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:08,669 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:12,337 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:12,337 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:15,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:15,950 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:19,584 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:23,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:23,152 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:26,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████ | 1043/1115 [6:46:39<18:19, 15.27s/it] Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████ | 1043/1115 [6:46:39<18:19, 15.27s/it] Setting `use_cache=False`...1] 2022-03-26 02:25:01,294 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████ | 1043/1115 [6:46:39<18:19, 15.27s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████ | 1043/1115 [6:46:39<18:19, 15.27s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:33,928 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:37,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:37,512 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:41,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:41,022 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:44,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:44,508 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:48,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:48,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:52,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:52,555 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:56,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:25:56,060 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:30,393 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████ | 1044/1115 [6:47:09<23:05, 19.51s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████ | 1044/1115 [6:47:09<23:05, 19.51s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:03,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:03,303 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:06,803 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:10,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:10,245 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:13,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:13,681 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:17,165 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:20,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:20,606 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:24,039 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▏ | 1045/1115 [6:47:37<25:40, 22.01s/it] Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▏ | 1045/1115 [6:47:37<25:40, 22.01s/it] Setting `use_cache=False`...1] 2022-03-26 02:25:59,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▏ | 1045/1115 [6:47:37<25:40, 22.01s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:30,947 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:30,947 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:34,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:34,334 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2143, 'learning_rate': 3.560975609756097e-05, 'epoch': 4.69} [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1886, 'learning_rate': 3.512195121951219e-05, 'epoch': 4.7} [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2147, 'learning_rate': 3.463414634146341e-05, 'epoch': 4.7} [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:26:37,819 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1671, 'learning_rate': 3.365853658536585e-05, 'epoch': 4.71} 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.15, 'learning_rate': 3.317073170731707e-05, 'epoch': 4.71} 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1564, 'learning_rate': 3.268292682926829e-05, 'epoch': 4.72} 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1517, 'learning_rate': 3.219512195121951e-05, 'epoch': 4.72} 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.165, 'learning_rate': 3.170731707317073e-05, 'epoch': 4.73} 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1202, 'learning_rate': 3.121951219512195e-05, 'epoch': 4.73} 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1199, 'learning_rate': 3.0731707317073165e-05, 'epoch': 4.74} 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1425, 'learning_rate': 3.024390243902439e-05, 'epoch': 4.74} 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1322, 'learning_rate': 2.9756097560975606e-05, 'epoch': 4.74} 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1356, 'learning_rate': 2.9268292682926826e-05, 'epoch': 4.75} 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1227, 'learning_rate': 2.8780487804878046e-05, 'epoch': 4.75} 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1052, 'learning_rate': 2.8292682926829267e-05, 'epoch': 4.76} 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1255, 'learning_rate': 2.7804878048780484e-05, 'epoch': 4.76} 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0951, 'learning_rate': 2.7317073170731707e-05, 'epoch': 4.77} 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 94%|████████████████████████████████████████████████████████████████████████▍ | 1049/1115 [6:49:25<28:24, 25.83s/it] Setting `use_cache=False`...1] 2022-03-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1074, 'learning_rate': 2.6829268292682924e-05, 'epoch': 4.77} [WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:34:30,912 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1143, 'learning_rate': 2.634146341463414e-05, 'epoch': 4.78} [WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:35:06,110 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1112, 'learning_rate': 2.5853658536585365e-05, 'epoch': 4.78} 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1014, 'learning_rate': 2.5365853658536582e-05, 'epoch': 4.78} 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▌ | 1066/1115 [6:56:41<20:12, 24.75s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1068/1115 [6:57:28<19:01, 24.28s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1068/1115 [6:57:28<19:01, 24.28s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0987, 'learning_rate': 2.4878048780487805e-05, 'epoch': 4.79} 96%|█████████████████████████████████████████████████████████████████████████▊ | 1068/1115 [6:57:28<19:01, 24.28s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1068/1115 [6:57:28<19:01, 24.28s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1068/1115 [6:57:28<19:01, 24.28s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:36:28,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:36:28,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:36:28,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:36:28,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:36:28,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:36:28,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:36:28,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:36:28,486 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1075, 'learning_rate': 2.3902439024390243e-05, 'epoch': 4.8} 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1121, 'learning_rate': 2.3414634146341463e-05, 'epoch': 4.8} 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|█████████████████████████████████████████████████████████████████████████▊ | 1069/1115 [6:57:53<18:40, 24.37s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:37:40,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:37:40,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:37:40,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:37:40,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:37:40,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:37:40,789 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0914, 'learning_rate': 2.292682926829268e-05, 'epoch': 4.81} 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1099, 'learning_rate': 2.24390243902439e-05, 'epoch': 4.81} 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 96%|██████████████████████████████████████████████████████████████████████████ | 1072/1115 [6:59:02<16:49, 23.48s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0843, 'learning_rate': 2.1951219512195117e-05, 'epoch': 4.82} [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1039, 'learning_rate': 2.146341463414634e-05, 'epoch': 4.82} [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:38:23,951 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:14,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:14,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:14,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:14,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:14,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1008, 'learning_rate': 2.0975609756097558e-05, 'epoch': 4.83} [WARNING|modeling_utils.py:388] 2022-03-26 02:39:14,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:14,750 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:29,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:29,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:29,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:29,334 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:37,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:37,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:37,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:37,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0954, 'learning_rate': 2.048780487804878e-05, 'epoch': 4.83} [WARNING|modeling_utils.py:388] 2022-03-26 02:39:37,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:37,209 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:39:49,941 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:39:49,941 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:39:49,941 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:55,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:55,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:55,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:39:55,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████▍ | 1078/1115 [7:01:14<13:24, 21.73s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████▍ | 1078/1115 [7:01:14<13:24, 21.73s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0986, 'learning_rate': 1.9999999999999998e-05, 'epoch': 4.83} 97%|██████████████████████████████████████████████████████████████████████████▍ | 1078/1115 [7:01:14<13:24, 21.73s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:09,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:09,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:09,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:09,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:09,909 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:20,028 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:20,028 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████▌ | 1079/1115 [7:01:34<12:47, 21.33s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████▌ | 1079/1115 [7:01:34<12:47, 21.33s/it]g-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:26,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:26,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:26,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:32,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:32,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:32,558 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:38,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:38,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:38,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:38,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:44,913 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:44,913 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:44,913 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:50,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:50,907 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:40:55,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:40:55,246 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:40:59,231 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:41:01,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:41:01,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:41:01,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0804, 'learning_rate': 1.8536585365853656e-05, 'epoch': 4.85} [WARNING|modeling_utils.py:388] 2022-03-26 02:41:01,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:41:01,604 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:41:11,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:41:11,858 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:41:15,906 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:41:18,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:41:18,119 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:41:21,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:41:21,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0896, 'learning_rate': 1.8048780487804876e-05, 'epoch': 4.85} [WARNING|modeling_bart.py:1051] 2022-03-26 02:41:26,027 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:41:28,212 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:41:30,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:41:30,353 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:41:33,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:41:33,933 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:41:37,731 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:26:27,605 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████▊ | 1083/1115 [7:02:49<10:13, 19.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████▊ | 1083/1115 [7:02:49<10:13, 19.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:41:41,963 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:41:44,005 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:41:46,071 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:41:48,076 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:41:50,081 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:41:52,012 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:41:53,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:41:53,942 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:39,895 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████▊ | 1084/1115 [7:03:06<09:25, 18.26s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:41:55,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:41:57,837 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:55,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:41:59,697 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:55,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:01,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:55,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:03,399 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:55,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:05,215 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:55,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:08,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:55,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:08,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:41:55,953 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████▉ | 1085/1115 [7:03:20<08:36, 17.20s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:42:10,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:12,390 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:10,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:14,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:10,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:15,803 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:10,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:17,492 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:10,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:20,788 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:10,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:22,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:10,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:22,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:10,648 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|██████████████████████████████████████████████████████████████████████████▉ | 1086/1115 [7:03:34<07:47, 16.11s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:42:24,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:25,753 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:24,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:28,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:24,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:30,423 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:24,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:31,934 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:24,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:34,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:24,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:34,946 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:24,143 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 97%|███████████████████████████████████████████████████████████████████████████ | 1087/1115 [7:03:46<06:59, 15.00s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:42:36,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:39,292 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:36,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:40,668 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:36,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:42,852 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:36,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:45,457 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:36,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:46,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:36,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:46,751 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:36,507 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:49,485 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:48,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:50,738 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:48,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:53,238 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:48,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:55,643 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:48,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▏ | 1089/1115 [7:04:08<05:32, 12.81s/it] Setting `use_cache=False`...1] 2022-03-26 02:42:48,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▏ | 1089/1115 [7:04:08<05:32, 12.81s/it] Setting `use_cache=False`...1] 2022-03-26 02:42:48,197 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:42:59,234 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:58,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:01,456 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:58,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:03,621 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:58,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:58,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:05,682 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:42:58,092 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:07,776 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:06,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:09,641 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:06,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:11,445 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:06,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:13,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:06,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:13,252 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:06,799 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:15,070 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:14,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:17,532 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:14,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:19,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:14,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:19,758 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:14,219 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▍ | 1092/1115 [7:04:31<03:31, 9.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▍ | 1092/1115 [7:04:31<03:31, 9.18s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:25,485 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:25,485 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:29,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:29,051 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:32,659 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:36,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:36,186 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:39,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:39,656 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:43,205 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:43,205 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:46,693 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:46,693 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:21,733 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▍ | 1093/1115 [7:04:59<05:30, 15.04s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▍ | 1093/1115 [7:04:59<05:30, 15.04s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:53,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:53,766 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:43:57,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:00,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:00,551 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:03,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:03,978 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:07,410 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:07,410 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:11,840 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:15,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:15,271 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:43:50,266 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▌ | 1094/1115 [7:05:28<06:40, 19.09s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▌ | 1094/1115 [7:05:28<06:40, 19.09s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1688, 'learning_rate': 1.2195121951219511e-05, 'epoch': 4.91} [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:22,196 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:25,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:25,589 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:28,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:28,968 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:32,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:32,324 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:35,709 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:39,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:39,061 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:42,432 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▌ | 1095/1115 [7:05:55<07:09, 21.49s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▌ | 1095/1115 [7:05:55<07:09, 21.49s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:18,820 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▌ | 1095/1115 [7:05:55<07:09, 21.49s/it][WARNING|modeling_bart.py:1051] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:49,276 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:49,276 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:52,703 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:56,038 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:56,038 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:59,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:44:59,387 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:45:02,745 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:45:06,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:45:06,013 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:45:09,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:45:09,338 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1589, 'learning_rate': 1.073170731707317e-05, 'epoch': 4.92} 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1198, 'learning_rate': 1.024390243902439e-05, 'epoch': 4.92} 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1327, 'learning_rate': 9.75609756097561e-06, 'epoch': 4.93} 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1172, 'learning_rate': 9.268292682926828e-06, 'epoch': 4.93} 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1378, 'learning_rate': 8.780487804878048e-06, 'epoch': 4.94} 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1146, 'learning_rate': 8.292682926829267e-06, 'epoch': 4.94} 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1076, 'learning_rate': 7.804878048780487e-06, 'epoch': 4.95} 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 98%|███████████████████████████████████████████████████████████████████████████▋ | 1096/1115 [7:06:22<07:19, 23.12s/it] Setting `use_cache=False`...1] 2022-03-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:21,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:21,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:21,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:21,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:21,867 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0927, 'learning_rate': 7.3170731707317065e-06, 'epoch': 4.95} [WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:48:32,235 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1055, 'learning_rate': 6.3414634146341454e-06, 'epoch': 4.96} [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.0898, 'learning_rate': 5.853658536585366e-06, 'epoch': 4.96} [WARNING|modeling_bart.py:1051] 2022-03-26 02:49:01,163 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:49:49,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:49:49,763 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:49:53,842 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:49:53,842 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:49:57,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:49:57,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:49:57,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:49:57,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:49:57,836 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▌| 1108/1115 [7:11:16<02:42, 23.24s/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▌| 1108/1115 [7:11:16<02:42, 23.24s/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▌| 1108/1115 [7:11:16<02:42, 23.24s/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▌| 1108/1115 [7:11:16<02:42, 23.24s/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▌| 1108/1115 [7:11:16<02:42, 23.24s/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▌| 1108/1115 [7:11:16<02:42, 23.24s/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 99%|████████████████████████████████████████████████████████████████████████████▌| 1108/1115 [7:11:16<02:42, 23.24s/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:50:20,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:50:20,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:50:20,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:50:20,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:50:20,332 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.1008, 'learning_rate': 4.878048780487805e-06, 'epoch': 4.97} [WARNING|modeling_utils.py:388] 2022-03-26 02:50:30,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:50:30,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:50:30,623 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:50:36,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:50:36,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:50:36,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:50:36,743 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:50:44,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:50:44,695 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 100%|████████████████████████████████████████████████████████████████████████████▋| 1110/1115 [7:11:57<01:48, 21.65s/it] Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:50:48,722 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:50:48,722 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:50:52,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_bart.py:1051] 2022-03-26 02:50:52,875 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...e computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:50:56,638 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:50:58,786 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:00,926 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:02,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:02,984 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:05,131 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:07,135 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:09,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:10,936 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:12,796 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:14,613 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:16,415 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:18,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:18,183 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:19,993 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:21,658 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:24,842 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:27,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:28,625 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:31,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:31,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:32,844 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:35,320 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:36,495 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:38,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:40,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:40,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:42,880 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:45,579 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:47,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [WARNING|modeling_utils.py:388] 2022-03-26 02:51:47,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'loss': 0.2069, 'learning_rate': 1.9512195121951218e-06, 'epoch': 5.0} 100%|█████████████████████████████████████████████████████████████████████████████| 1115/1115 [7:12:59<00:00, 23.30s/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:00,252 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|feature_extraction_utils.py:324] 2022-03-26 02:52:12,187 >> Configuration saved in ./preprocessor_config.jsons/it]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/26/2022 02:55:30 - WARNING - huggingface_hub.repository - To https://huggingface.co/sanchit-gandhi/wav2vec2-2-bart-large-cnn Upload file runs/Mar25_19-38-20_sanchit--v100/events.out.tfevents.1648237127.sanchit--v100.1762967.0: 100%|█| 181k/181k g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'dataset': {'name': 'librispeech_asr', 'type': 'librispeech_asr', 'args': 'clean'}}--v100.1762967.0: 100%|█| 181k/181k g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'dataset': {'name': 'librispeech_asr', 'type': 'librispeech_asr', 'args': 'clean'}}--v100.1762967.0: 100%|█| 181k/181k g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'dataset': {'name': 'librispeech_asr', 'type': 'librispeech_asr', 'args': 'clean'}}--v100.1762967.0: 100%|█| 181k/181k g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'dataset': {'name': 'librispeech_asr', 'type': 'librispeech_asr', 'args': 'clean'}}--v100.1762967.0: 100%|█| 181k/181k g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'dataset': {'name': 'librispeech_asr', 'type': 'librispeech_asr', 'args': 'clean'}}--v100.1762967.0: 100%|█| 181k/181k g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'dataset': {'name': 'librispeech_asr', 'type': 'librispeech_asr', 'args': 'clean'}}--v100.1762967.0: 100%|█| 181k/181k g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... {'dataset': {'name': 'librispeech_asr', 'type': 'librispeech_asr', 'args': 'clean'}}--v100.1762967.0: 100%|█| 181k/181k g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/26/2022 02:55:49 - WARNING - huggingface_hub.repository - To https://huggingface.co/sanchit-gandhi/wav2vec2-2-bart-large-cnn Upload file wandb/run-20220325_193848-1sz5964i/run-1sz5964i.wandb: 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... ***** train metrics ***** epoch = 5.0 train_loss = 2.4987 train_runtime = 7:13:00.55 train_samples = 28538 train_samples_per_second = 5.492 train_steps_per_second = 0.043 03/26/2022 02:55:51 - INFO - __main__ - *** Evaluate *** [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/26/2022 03:07:45 - INFO - datasets.metric - Removing /home/sanchit_huggingface_co/.cache/huggingface/metrics/wer/default/default_experiment-1-0.arrow ***** eval metrics ***** epoch = 5.0 eval_loss = 0.3463 eval_runtime = 0:11:53.96 eval_samples = 2642 eval_samples_per_second = 3.7 eval_steps_per_second = 0.464 eval_wer = 0.1012 [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... [INFO|trainer.py:2369] 2022-03-26 02:55:51,800 >> Batch size = 8 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... 03/26/2022 03:08:31 - WARNING - huggingface_hub.repository - To https://huggingface.co/sanchit-gandhi/wav2vec2-2-bart-large-cnn Upload file wandb/run-20220325_193848-1sz5964i/run-1sz5964i.wandb: 100%|█████████████| 218M/218M [00:11<00:00, 20.7MB/s]g-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... File "/home/sanchit_huggingface_co/gcp/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 870, in model_infog-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... File "/home/sanchit_huggingface_co/gcp/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 870, in model_infog-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`... File "/home/sanchit_huggingface_co/gcp/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 870, in model_infog-point operations will not be computed-26 02:44:45,865 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...